1. Operational Business Intelligence
The Next Business Challenge. A business executive was telling me about the next problem she faces in operating her chain of specialty prepared food stores: it takes too long to service a customer, but she cannot afford to increase labor costs further. She knows the solution is to have store managers make more precise staffing decisions, reacting promptly to changes in customer traffic and purchase patterns.
But store managers lack the information they need to do this. They get weekly reports on sales by product. What they need is fully detailed purchase history by 10-minute periods – and they need it within a few minutes of real time. Only then can store managers relate their observations about how long customers are waiting to what they are buying and therefore to specific, timely actions with respect to staffing. The store manager needs to understand promptly what is causing the delay; what is required to correct matters as soon as possible; and how to take the best actions for the next shift, the next day and the next week.
A New Decision Frontier. What the store managers need is operational business intelligence (BI): day-by-day and hour-by-hour decisions made by hundreds of store managers to deal with the seemingly small matters. Each decision may, by itself, have a relatively small impact on a large enterprise. But, taken together, the thousands of decisions each day by store managers have a large impact. In many enterprises, operational BI is the next major frontier in business performance.
In general, operational BI is characterized by business needs to:
• react rapidly to events with actions by people who are part of the line business operation; and,
• make such actions more specific, accurate, appropriate and timely than in the past.
Such capabilities have been shown to make a significant cumulative difference in business performance in many industries. Examples range from better stocking of retail stores; to more efficient use of labor in shipping companies; to better treatment of the most profitable customers; to reducing manufacturing cost and delays via better management of the supply chain.
Formidable Requirements. While there is rising interest in the business community in operational BI, there is a rising concern among IT professionals: the requirements are formidable and many data warehouse infrastructures are already overloaded. An operational BI solution must:
• Scale. That is, it must handle increases in data volume and workload without disproportionate increases in cost or response time.
• Be economical. Total cost of ownership is what is important here. Users need good return on their investment to be able to implement operational BI.
• Deliver rapid response to queries. People on the front lines of any organization typically work at a fast pace. They need answers while the customer or supplier is on the phone; when the delivery is at the dock; or at the moment that something goes awry in the factory.
• Support many concurrent users. Operational BI is not about servicing 20 analysts in the back office; it is about servicing hundreds or thousands of people doing every day jobs throughout the enterprise.
• Keep data up to date, even if it is updated nearly continuously throughout the day. Operational BI is often about taking action right now – on the basis of what happened in the last few hours or last few minutes.
• Remain available even as components fail; disks and nodes may fail, but many elements of the modern enterprise must just keep going; and,
• Often, operate seven by twenty-four, so, you can’t have much in the way of scheduled downtime, either.
For many users who would like to implement operational BI, both the technical requirements and the cost are significant barriers. These users are searching for a solution that can do the job at a reasonable price.
2. The ParAccel Analytic Database™
A New Architecture. While there are products on the market today that address these requirements, ParAccel (www.paraccel.com) believes it can do that better and more economically with its ParAccel Analytic Database (PADB). And, ParAccel has developed a new architecture to back up that claim.
How is that possible? ParAccel has taken the approach of combining column storage, compression, full parallelism, in-memory databases and a few other innovations in its database engine. This architecture has been tested in the field and is now commercially available. As a result, ParAccel aims to deliver a solution for operational BI that has higher performance, and lower cost than those presently on the market.
Typical Configuration and Pricing. Two typical ParAccel configurations are shown in Figures 1 and 2.
Figure 1 shows a 10 TB disk based data warehouse. That is, 10 TB of user data. Here, PADB runs on a 10-node system, where each node is a dual processor, dualcore Opteron server with 16 or 32 GB of memory and twelve 146 GB disks. The nodes are connected via dual gigabit Ethernet links and a high speed switch.
Pricing for the system in Figure 1 would be comparable to data warehouse appliances – WinterCorp estimates that with software, the list price would be approximately $650,000 or $65k/TB. Note that the established data warehouse products such as Oracle and Teradata typically charge five or more times as much to warehouse a TB of data.

Figure 1: Typical ParAccel Configuration for a Disk Based Data Warehouse
While the existing data warehouse appliances feature a similar price, ParAccel ups the ante on value with a product that has been designed to handle a broad and challenging set of data warehouse requirements – and with eye-opening performance.
But, Figure 2 shows an example of another interesting configuration option– one for in-memory data warehousing. In figure 2, the node configuration is the same, except that fewer and smaller disks are used. Here, the ten nodes are employed to hold a total 500 GB (0.5 TB) of user data in memory at all times. ParAccel says that the 500 GB of raw data can be expected to compress down to about 125 GB, leaving 35 GB of main memory for software, workspace, etc. WinterCorp estimates that this configuration will list for about $550k. But, of course, it will provide yet higher performance than the configuration in Figure 1, since no disk I/O is required and database operations will be conducted at processor speed.

Figure 2: Typical ParAccel Configuration for Memory Based Data Warehouse
These configurations are examples. ParAccel is available in configurations of 5- nodes and up. There is flexibility concerning type of processor (e.g., Opteron or Xeon), number of cores, amount of memory, and how much and what type of storage can be attached to a node (e.g., direct-attached, storage modules, or enterprise storage).
Column Storage and Compression. Column storage, coupled with compression, is a powerful feature for data warehouse applications. Sybase IQ proved this in the 90s, when it began delivering column storage solutions with very high efficiency for data mart applications.
| Sidebar: ParAccel gives the example of computing average age by state on a 300 GB table containing census data. A conventional, row storage engine is going to read the entire 300 GB table. Making the reasonable assumption that the age column occupies no more than a hundredth of each row, a column storage engine will read just 3 GB of data to process the same query. But, if the column is compressed, then it is likely to read only 1 GB of data. |
The argument goes that most data warehouse tables are wide and most queries retrieve less than 10% of the columns in each row.
Data is far more compressible along columns than along rows, because of the similarity of the values within a column. ParAccel pursues compression aggressively, a good strategy because processor speeds are growing much faster than disk I/O rates. The database automatically chooses from 8 different compression approaches, heuristically selecting the best for each column. While compression ratios as high as 20-to-1 have been observed thus far in field tests, ParAccel suggests that 4-to-1 is a good ratio to use for general planning.
Column storage will work well in many data warehouse applications. But, as indicated by the history of Sybase IQ, column storage isn’t new.
Fully Parallel Architecture. One thing that does seem to be new in PADB is the combination of column storage and a fully parallel architecture. PADB has been engineered from the ground up for highly parallel operation. This means that all fundamental database operations are executed in parallel and that the cost-based query optimizer has been built with parallel operation in mind.
The shared nothing database is partitioned into “slices”; presently, systems are configured with a slice per core. So, the systems in Figures 1 and 2 would be performing queries 40 ways in parallel (10 nodes x 2 processors per node x 2 cores per processor). In the disk based system in Figure 1, there would be about 250 GB of data per slice. In the memory based system in Figure 2, there would be about 2.5 GB of data per slice.
It is significant that highly parallel operation is designed into the system from the beginning. ParAccel has built its own query execution engine, featuring a parallelaware optimizer and sophisticated strategies for highly parallel execution of complex (e.g., multi-join and aggregation) queries, including queries that must move a lot of data over the interconnect. The result – on the basis of early customer experiences looks very promising in terms of performance.
In Memory Databases. A database that resides entirely in memory (and, therefore requires no disk I/O for standard database operations) has already been shown to deliver substantial performance advantages for certain applications. Several such products have been available on the market for the last few years.
However, ParAccel is distinctive in combining in-memory operation with its other features. Parallel operation is important, even with in-memory databases, because data warehouse tables now routinely contain billions of rows. Even at memory speeds, searching a billion items serially takes time (and, yes, this matters even when you have the data organized by columns).
Now, consider a system that would commonly be described as a “two terabyte” data warehouse – because it occupies two terabytes of disk. WinterCorp research shows that such a system is likely to contain no more than 500 GB of user data. This “two terabyte” data warehouse could thus be managed in memory by ParAccel with a configuration like that in Figure 2.
Concurrency control via snapshot isolation. A key requirement in operational BI is that the data warehouse efficiently and promptly process updates, even as queries continue to run. Using a snapshot isolation technique, PADB is able to perform updates upon receipt without blocking queries. Similarly, queries do not prevent updates from running.
Roughly speaking, the scheme works as follows. Each database transaction is assigned a unique numeric ID, one larger than the last ID assigned. When a record is inserted, the ID of the transaction making the insertion is stored in the record. When a record is deleted, it is left in place, marked “deleted”, and ID of the transaction deleting it is stored in the record. Changes are implemented as an insert with the new values, followed by a delete of the older version of the record. All transactions ignore records that have a transaction ID higher than their own. If desired, old versions of records are vacuumed out of the system by a background routine, once the transaction that operated on them has committed.
The effect is that each query operates on a snapshot of the database, as it existed at the moment the query began to run. If an update was in progress as the query began, its changes are not visible to that query. However, the changes do become visible later to all queries that start after the update commits.
Operational BI applications will typically run their updates against PADB in small batches. For example, if the business requirement is that data be visible in the database within 5 minutes of receipt, updates might be run in 4 minute batches. This technique has been demonstrated successfully with PADB with latency requirements as low as 5 seconds.
“AMIGO” Capability. ParAccel also offers an implementation feature, called ParAccel AMIGO™, to accelerate the performance and extend the scalability of an existing database instance (now available on Microsoft SQL Server and planned for Oracle). While leaving existing applications, reports and queries unchanged users can experience better performance, as shown in Figure 3.

Figure 3: ParAccel Amigo Architecture
Source: ParAccel (or, could be redrawn in a WinterCorp style)
With AMIGO, PADB manages replicas of designated tables in the system of record (e.g., Microsoft SQL Server). A query dispatcher (Q-Router) provided by ParAccel intercepts SQL calls directed to the system of record. Analytical queries, for which PADB can deliver a substantial query advantage, are passed to PADB. All other calls are sent through to the system of record.
The ParAccel component of the data warehouse is designed to scale out in a highly parallel, cost effective fashion as data and usage volumes rise.
3. Customer Experience
LatiNode is a rapidly growing, next generation IP solutions provider, servicing the VoIP industry. LatiNode presently has over $100million in annual sales. It has been using Microsoft SQL Server and Microsoft Analysis Services to support 10- 15 concurrent users accessing a 500 GB data warehouse. This customer was encountering bottlenecks and resource contention, resulting in unacceptably long query response times. After a successful trial of the AMIGO capability, in which dramatic improvements in query performance were realized, LatiNode decided to implement a data warehouse directly on PADB, having the data updated there. Since PADB supports Panorama NovaView – LatiNode’s BI tool of choice – they decided to simply use PADB as their standalone data warehouse. ParAccel refers to this implementation mode as ParAccel MAVERICK™. Implementation is underway and soon LatiNode expects to be in production with about a terabyte of data on PADB to support near-real-time operational analysis and reporting with data latency of 15-minutes or less.
Another interesting case involves a second company —let’s call it “LegalData, Inc.”— a major provider of case information to lawyers and courts. It has very large scale requirements for complex, online query. LegalData has about 25,000 users and provides them with services to search some 10-15TB of online legal information.
The legal information is managed with Microsoft SQL Server and accessed with custom applications written for it. Currently the information resides in databases that also support OLTP. LegalData conducted a trial of the ParAccel AMIGO product, employing PADB to handle analytic queries transparently. The VP of Technical Operations for the Litigation Services Division of LegalData reports that he observed 100 times improvement in response time for the queries handled by PADB.
This is a 7 by 24 mission critical application. LegalData plans to go into production with ParAccel on this application in the next quarter or two. As intended with the AMIGO architecture, PADB will be managing replicas of the tables needed for analytic query. They will be updated continuously as new information is added to the SQL Server based system of record. The AMIGO product will read the SQL Server logs, replicate database updates to the PADB, and make new data available to users for online inquiry within three seconds.
The VP of Technical Operations says that his customers want “Google-like speed” and he has selected the PADB as his vehicle to provide that. He believes that PADB will give him the rapid query performance needed at a far lower cost than any other platform in the industry. He also eliminates migration risk and the project resource implications of designing and building a separate data warehouse just to offer complex query capability.
In all large scale data management implementations, the ultimate proof comes from sustained success in production. These cases do not yet provide that proof. But, they are dramatic illustrations of successful proofs-of-concept, indicating that, at least for these two customers, the advantages ParAccel claims were true -- when tested with real data and real queries.
4. Industry Benchmark Results
ParAccel has posted results for the TPC-H benchmark at www.tpc.org.
The TPC-H benchmark has been developed by the Transaction Processing Council (TPC) as a standard measure of performance and scalability for decision support (e.g., data warehousing) data management and query. The TPC benchmarks are different from most others in the industry because:
- Specifications for the benchmark are rigorous, widely reviewed and openly published;
- Prior to publication, including posting at the TPC website:
o Benchmark runs must be audited by a certified TPC auditor; and,
o Benchmark results must be published in a standard format, and similarly audited.
These provisions make TPC benchmark results, including the TPC-H, among the best documented test results ever published concerning any product.
ParAccel has taken the relatively unusual step of completing and posting TPC-H benchmark results upon the release of its new ParAccel Analytic Database. In fact, the company has posted such results at three different scale factors, where the scale factor represents the volume of data loaded into the database for the test. Scale factors published to date are 100 GB, 300 GB and 1 TB.
Most newly introduced data warehousing products do not have published TPC-H results. These results are expensive and time consuming to obtain. And, it is unusual for new products to have all of the capabilities required -- as well as the performance and price/performance -- to be competitive with mature products on these tests.
ParAccel’s results are not only competitive – they are the best results ever posted at their respective scale factors – and, by large margins. ParAccel’s performance is roughly four times as good as the next best result. ParAccel’s price/performance is roughly twice as good as the next best result. This is a genuinely impressive accomplishment and suggests that ParAccel really has developed an exceptional product.
One note of caution: though TPC-H has much to recommend it as a benchmark, it is not nearly as complex and demanding as most actual data warehouse applications. Prospective users should regard TPC-H as one indicator of which products may merit evaluation. Especially with a new product, it is extremely important for users to evaluate and test data warehouse solutions against realistic representations of their own data and requirements.
5. Conclusions
ParAccel has developed a promising new architecture and is taking aim squarely at an important, rapidly growing market area within data warehousing. It is a new company and has much to prove, but this is a company to keep an eye on. ParAccel appears to have key capabilities for operational BI as well as mainstream data warehousing. It promises to put operational decision making in the hands of its customers—and those previously mentioned store managers—at an advantageous price.
As with any new product, users will need to evaluate the ParAccel Analytic Database carefully, employing realistic, full scale tests of performance, scalability, data availability and manageability. They will need to assess the risks and rewards of deploying a new platform. But, data warehouse users concerned with rising costs and lackluster performance would do well to take a look. And, for those who have a rapidly growing Microsoft SQL Server or Oracle data warehouse, ParAccel appears to offer an especially interesting growth path.