To V or Not To V
Business Intelligence Gets Virtual!

submitted by Claudia Imhoff, President and Founder, www.intelsols.comFriday, May 23, 2008

Article written by Claudia Imhoff, Ph.D., Intelligent Solutions.

Summary

In today’s dynamic environment, organizations are faced with new and increasingly critical decisions which can affect their very survival. Decision makers, in turn, have placed more and more demands on their Business Intelligence (BI) environments. Their demands are increasing the pressure on IT to deliver the right data at the right time, and faster than ever before.

“The right data” means well-documented, reliable, and consistent data.

“The right time” means delivering the appropriate data in a timeframe appropriate for the particular decision. That is, delivery with enough lead time to allow the effect(s) of the decision(s) to make a difference.

“The appropriate data” means the combination of historical data found in data warehouses and data marts, low latency data found in an operational data store, and real time data from operational systems. Bringing the appropriate data together requires sophisticated data
integration infrastructures that go way beyond simple data combinations and displays. So, data integration requires a mixture of techniques and technologies – each specific to the type of decisionmaking being performed.

In this paper, the two most popular techniques for data integration (virtual data federation and physical data consolidation) are described, and the data integration technologies that support each technique (EII and ETL) are reviewed. Through eight BI examples, advice is offered to help implementers determine when to use each technique and technology.

A Strategic Architecture for BI

Today’s enterprises must be fueled by good decisions – about their customers, products, employees, partners, and more. Business Intelligence (BI) has long helped the company’s decision makers by supplying them with reliable versions of historical data – snapshots used to determine trends, patterns, comparisons over time, etc.

Strategic decisions are certainly critical to the ongoing well being of the organization, but are not the only type of decisions being made. Today’s BI architecture must also support a myriad of operational decisions, driven by today’s dynamic business environment.

The public domain, logical architecture that I advocate for BI is the Corporate Information Factory1 (CIF). Over the years, the CIF has earned its stripes as the most reliable and most implemented architecture for sustainable BI.


Figure 1: The Corporate Information Factory

Several CIF components – the data warehouse, data marts, and operational data store – have remained fairly stable over the years. The data warehouse is still the main source of generic, integrated, historical snapshots of data. The operational data store is used to store low latency or near-real time integrated data. The specialized analytical data marts are dependent on the data warehouse or operational data store for their supply of data and contain the analytical applications. Each serves a specific purpose in handling planned and unplanned queries and reports.

Although these components have endured, the way we create them has changed dramatically. While there is no question of the value of historical snapshots, the time latency involved in the preparation of the data makes them unsuitable for intraday or operational decision
making. So, the CIF evolved to support not only traditional, strategic BI, but also tactical and operational BI.

Data Integration Techniques & Technologies

Today, BI implementers are fortunate to have two data integration techniques available to help them create world class environments – CIF or otherwise. These are Physical Data Consolidation and Virtual Data Federation:

  • Physical Data Consolidation – This technique uses processes that capture, cleanse, integrate, transform, and load data into a target data store. Typically data is consolidated using extraction, transformation, and load (ETL) technologies, which obtain data
    from operational data sources, transform it to the corporate standard, and load it into physical data stores. Informatica’s PowerCenter is an example of an ETL product.
  • Virtual Data Federation – This technique uses processes that provide a real-time integrated view of disparate data types from multiple sources, providing a universal data access layer. Data federation uses enterprise information integration (EII) technologies to create virtual stores of data from data warehouses, marts, operational data stores, and operational systems. Composite Software’s Information Server is an example of an EII product.

The biggest question for implementers is not whether to use these techniques, but when to use them. Each has its place within a data integration infrastructure but it is not always clear when to use one or the other. To help implementers make the “when” decision, eight examples are presented below, including the rationale, the benefits, and the drawbacks of choosing one over the other.

Data Warehouse Integration Example

Example 1: Implementing a Data Warehouse

A data warehouse is the cornerstone component of any BI environment. It is the repository of non-volatile, historical snapshots of enterprise data used for all sorts of analytical and reporting purposes. The data warehouse does not store all data – only the data needed for analytical purposes. Therefore it should not be thought of as an operational data archiving mechanism.

Due to performance and other reasons, the operational systems generally do not hold more than a year’s worth of data in their databases. The rest is generally archived. Unlike operational systems, the data warehouse has no time limit on the data it stores. Many organizations store 2, 5, even 10 or more years of snapshot data in their data warehouses.

For all data warehouse implementations, data must be physically extracted from the operational environment. Once extracted, the heavy lifting of transforming the data into consistent, standardized, and cleansed snapshots is performed by physical data consolidation techniques using the ETL technologies. These snapshots are then loaded into the data warehouse schema and made available for all forms of analytics and times-series reporting.

Data Mart Integration Examples

Data marts can be implemented virtually or physically, with each approach having distinct advantages and disadvantages. It is recommended that BI implementers first determine if a virtual mart can satisfy their users’ BI needs before building a physical mart.

Example 2: When to Use Virtual Data Marts

As mentioned earlier, the CIF is a logical or conceptual architecture. However, many people go about implementing it as if it were a physical architecture. That is, they create only physical marts by extracting data from the data warehouse source, reformatting it, and loading it into its own data mart environment. While this can satisfy BI needs, it is often unnecessary and inefficient.

A virtual mart may be created in one of two ways. The first method is to use a BI tool to create required views of the data. There are many BI tools for which a view of data is a suitable substitute for a physical data mart. For example, Business Objects is long known for its utilization of “universes” of data.

But this form of virtual data mart can also be problematic – first, it can be cumbersome when multiple tiers of views are created. Most BI tools do not have sufficient view documentation and business users can make erroneous assumptions about what data is actually being analyzed. Second, these views generally work only with the BI tool creating them. That is, they are proprietary to the BI tool and are not available for use by other BI tools.

The second and often better way of creating needed views is via virtual data federation techniques using EII technology. EII technology allows enterprises to aggregate and present data warehouse data to BI applications via real-time access.

There are several advantages to virtual data mart creation:

  • No movement of data – The problems with bulk movement of data into physical data marts simply do not exist with virtual marts. Data can’t be lost or its delivery delayed.
  • Real-time access to the data – Virtual data marts allow realtime access to the historical data as long as the source data warehouse is accessible. This is particularly important for data warehouses that are updated frequently during the day.
  • Reduced costs – Other than the EII technology, there is no need for additional hardware or software which, in turn, means a reduction in overall costs.
  • Easy to maintain – Tearing down a view and rebuilding it is far easier to do than unloading a physical data mart, reconstructing its schema, and then reloading the data. Therefore maintenance is simpler.
  • Security reasons – For some organizations, there may be privacy issues or compliance regulations that prevent proliferation of sensitive data.

In addition to these advantages for virtual data marts, EII technology brings its own specific advantages:

  • Rapid and iterative development – For situations in which the business users are unsure of their requirements or their requirements are volatile, a virtual data mart using EII technology makes sense. The ability to quickly implement the virtual mart, get the users’ input, and then swiftly recreate the view is an inexpensive and beneficial way of gaining insight into the necessary requirements. It ensures that the application will satisfy the user community.
  • Reduced maintenance – Just as the time for development is reduced, so is the time for maintenance. In many cases, IT may find that the EII technology is easy enough to use that they no longer need to perform the maintenance; it can be performed by the users themselves.
  • Masked data warehouse complexity – The data schema for the data warehouse is usually quite complex and rather mystifying to the average business user. EII technology eliminates the need for users to understand this data schema; instead, a data view is customized into the way users want to consume data.

However, there are several concerns of significance when considering virtual data marts.

  • Performance must be monitored – Virtual data marts may appear to be separate entities to the business user but they are indeed all using the same source of data – the data warehouse. With multiple BI applications running and multiple users accessing the data, the data warehouse must be constantly monitored for performance. If a group of users launches long-running, very complex queries that require massive amounts of data, the performance of the entire environment could be compromised. A physical data mart may be needed for these users.
  • Support for proprietary BI technologies – If the EII technology cannot support a proprietary BI technology, then the IT implementers have no choice but to create a separate, physical data mart for that technology. Standards such as SOAP, ODBC, and JDBC mitigate this problem however.

Example 3: When to use Physical Data Marts

There are some matters to consider when determining whether a dependent physical data mart is appropriate.

  • A change in technology – The data warehouse is best stored in technology that can handle massive amounts of data with fast load and unload times. These are usually relational databases or some form of data warehouse appliance. Some BI technologies may use their own form of proprietary technology or require a technology different from that used for the data warehouse. If this is the case, then you have no choice but to extract a subset of data warehouse data for the application and load it into a separate data mart for
    usage by the BI technology.
  • Performance reasons – Performance of the BI application can be better if a physically distinct data mart is set up for it. For example, heavy data mining analytics can impact the lighter reporting requests if both are performed within the data warehouse environment. It makes sense then to replicate the data mining data into its own environment where performance can be tailored to the data miners’ specific needs.
  • Security reasons – For some organizations, there may be a need to create a separate data mart or warehouse that contains sensitive data or algorithms. Privacy issues, security worries, or in some cases, even compliance regulations mandate a separate and protected environment.

There are disadvantages to building separate physical data marts as well:

  • Redundancy of data – Building a separate physical data mart means more storage, more ETL processing, redundancy of efforts, redundancy of the data, and higher costs. Also, anytime you have to re-extract data from the data warehouse, there is always the chance that the data mart data will get out of sync with the data warehouse data. There should be an established reason or ROI before this process is performed.
  • More IT maintenance and costs – Obviously, creation of a physical data mart means extra work for the BI team. Any time that the data mart design changes (which it certainly will), the data has to be unloaded, the schema changed, and the data reloaded – not a trivial task. In addition, IT must maintain backup/recovery schemes and archiving media for both the data warehouse and physical data marts. Most physical marts require additional hardware and software licenses, which add to the overall costs of the BI environment.
  • Potential inconsistency – Many times, the dependent data mart is created by a line-of-business IT group or by technologically savvy business users themselves. These groups are generally not under the same constraints as the data warehouse implementers are, and therefore, they may not use the same nomenclature, algorithms, or even formatting as was used in the data warehouse. This often causes inconsistency between their analyses and those performed by data warehouse analysts.
  • Bulk movement of data – Because the data mart is physically distinct from the data warehouse, the data must be transferred via the network to the mart. Any time data must be physically moved, there exists the possibility of problems, e.g., the delivery can be delayed or data may be only partially delivered. Secondly, if the data warehouse is updated more than once a day, it may not be possible to reload the physical data marts with each update.

Data Warehouse / Data Mart Archiving and Extension Integration Examples

Example 4: Combine Data Warehouse (or Data Mart) Data with Archived Data Warehouse Data

Many enterprises now archive older, infrequently used snapshots of data warehouse data to other media. Because the data is not contiguous with the data warehouse, it may be difficult to satisfy one-off queries that span the two data forms. The easiest and fastest way to combine these sets of data is to use virtual data federation technology. Rather than physically (re)loading archived data into the data warehouse, the BI team can quickly create a virtual environment with EII technology. When the query completes, the virtual combination can be just as quickly dismantled.

The advantages and drawbacks to this approach are the same as for a virtual data mart (Example 2). Fortunately there is one other advantage for this virtual scenario: usually very few users require the ability to span long timeframes of data warehouse data so their queries will usually not affect the performance of other users.

Example 5: Combine Data from Multiple Data Warehouses

Another scenario all too common in enterprises today is the existence of multiple data warehouses. The reasons for this are varied – mergers and acquisitions, political or cultural constraints, and security or privacy concerns to name just a few. Inevitably, a request will come up that requires data from these warehouses. Without having to overhaul the entire BI environment, such requests can often be satisfied by using virtual data federation techniques to create a virtual connection between the physical stores.

The advantages are similar to those for a virtual data mart – speed, low cost, no movement of data, and ease of maintenance. However, there is one significant concern to resolve – it is doubtful that the various data warehouses were built using an overarching standard data model. Therefore, they most likely will not have used consistent formats, names, data quality activities, calculations, etc., during their data consolidation processes. If a virtual connection is to be made between these repositories, then these disparities must
be abstracted or easily transformed.

Operational Data Store Integration Examples

The operational data store (ODS) is a sister to the data warehouse, similar in some aspects, but also similar to operational systems in others. Like the data warehouse, the ODS contains integrated, cleansed data for usage in reporting and analytics. But there the similarities end. The ODS contains current – or as current as the technology will allow – data that is updated rather than stored as historical snapshots. Little history is stored in the ODS which, along with the update status of its data, makes it similar to operational systems.

The ODS is often used to create management reports on operations or to perform analyses on current events like campaigns or other sales events, channel activities, inventory statuses, daily sales, etc.

Example 6: When to use a Virtual ODS

A real-time, virtual ODS is possible through the use of virtual data federation. For this, EII technology is used to directly access data in the operational systems. There are certain conditions that should be met before a virtual ODS makes sense. These are:

  • Operational need for real-time, not near real-time, data – The business users have a mandatory and demonstrable requirement for integrated, current data. Compliance, customer satisfaction, and just-in-time inventory are just a few of the drivers behind this need for real-time data.
  • Fairly well-integrated operational data – The amount of integration and quality processing that can be done “on the fly” is limited so source data for the virtual ODS should be in good shape to begin with. An example of this characteristic would be data from multiple instances of an ERP application such as SAP’s R3. The data exists in the same format, meaning, and usage – it just resides in multiple places.
  • Narrow queries – EII’s high performance query processing works best when the federated queries are specific in nature. They should focus on a small or limited amount of data returning an equally limited amount of data. Generally the ODS is used to examine a few records – e.g., a customer’s current transactions or a product’s worldwide inventory level.
  • Unpredictable queries – Operational BI needs and data are quite varied – e.g. a customer’s last five shipments are needed, next their last ten invoices, and then, their last three returns. A virtual ODS can easily meet these diverse and unpredictable operational requirements.

The advantages of a virtual ODS can be significant:

  • Short time to solution - A virtual ODS is quite useful when there is a short fuse on the need to access of operational data. Access can be implemented in a matter of hours or, at most a day or two.
  • Frequent change – If changes in requirements are constantly requested, the ability to quickly rework the data presentation is a great way to ensure user satisfaction. The agility of the enterprise places a high premium upon this form of ODS.
  • Security or privacy issues prevent replication of data – The need to access integrated data may be mandatory but the ability to create a physical ODS may be prohibited due to security, regulatory, or other issues.

Example 7: When to use a Physical ODS

A virtual ODS not make sense when the data volumes requested are too high, the integration and quality issues cannot be resolved at run time, and the performance of the underlying operational systems becomes compromised. A physical ODS is mandatory under these circumstances:

  • Complex data integration processing – A physical ODS may be the only place where a current “360 degree view” of important enterprise entities such as customer or product can exist. Many CRM projects used the ODS as the cornerstone for understanding an enterprise’s customers. If this data requires considerable integration processing, data consolidation using traditional ETL technology is required.
  • Extensive data quality processing – Similar to the data integration situation, many corporations must perform considerable name and address matching and hygiene along with house-holding processes before the customer data is usable. If this is the case, then a permanent, physical ODS is recommended.
  • Access by high numbers of users – If the ODS is to be made available to every frontline employee in an organization, its performance must be the principal driver and this access must not impact the operational systems performing their daily tasks. In this case, IT should construct a physical ODS.

Of course, there are some drawbacks to a physical ODS as well:

  • Data latency – A physical ODS usually does not contain realtime data; the data has some latency due to the data consolidation processes performed. This latency can be from a few seconds in duration up to 24 hours depending on the state of the operational environment, business users’ requirements, and the maturity of the data integration and quality infrastructure.
  • Potential performance problems – If the ODS is used for management reporting, operational look-ups, and BI analytics, it must be closely monitored to detect potential and real performance issues. Such a mixed workload can impact response times unless the ODS is implemented with appropriate indexing schemes, partitioning, etc.

Operational BI Integration Examples

Operational BI is the newest form of BI adopted by enterprises. It is meant to support operational decision-making by combining the results from strategic analyses with real-time and low-latency sources of data. A formal definition is:

Operational BI is a set of services, applications and technologies for monitoring, reporting on, analyzing and managing the business performance of an organization’s daily business operations.

Operational BI is used by line of business managers, front line employees and even operational processes to manage and optimize daily business operations. Its goal is to compress the time latency between knowing when an important business event has occurred and taking action in response to it. To be optimal, operational BI relies on the continuous availability of operational data and analytical results.

Example 8: The Need for Multiple Data Sources for Operational BI

The need to combine multiple, disparate sources of data in support of operational BI makes a great case for virtual data federation. The sources of data for operational BI include the operational systems, operational data store, and traditional BI analytics from the
data warehouse and/or data marts. From the point of view of the applications and business users, the data should appear as if it were in a single, seamless store of data. Of course, in reality, the data hasn’t been moved at all – it still resides in its original locations.

Wrap Up

This white paper has covered the most common data integration scenarios, differentiating when to use physical data consolidation techniques versus virtual data federation ones in support of today’s diverse BI strategies and architectures. As a final piece of advice, here
are some overarching considerations:

  • Understand the business problem – The best way of discerning which technology and technique will work best in your IT environment comes from a careful study of the business requirements. Do the business users need real-time, low latency data or historical data or a combination of all three? For the business processes, what types of data and interconnectivity is needed, who will be using the environment (analysts, executives or front-line operations personnel), what is the timeframe for decision making – immediate, tactical, or strategic in nature?
  • Understand the pros and cons of each technique and technology – Throughout this white paper, the benefits and drawbacks of each form of implementation – physical versus virtual – have been given. Flexibility is important so use it to your advantage. Study these to help you determine what will work best for your specific situation.
  • Keep a focus on future needs – No matter where you are today, the business will move rapidly into a new and different direction. Keep your options open regarding physical data consolidation and virtual data federation. If you are heavily using data consolidation today, you may find that data federation makes more sense in the near future. Reduction of costs, time to implementation, ease of maintenance, etc., are strong forces that may change the technological directions of your future BI implementations.

1 For more on the CIF, see CORPORATE INFORMATION FACTORY, 2nd Edition ISBN 0-471-39961-2

Copyright © 2008 – Intelligent Solutions, Inc. – All Rights Reserved

Republished with permission from:

Composite Software Inc.
2655 Campus Drive, Suite 200 | San Mateo, CA  94403
Phone: 650.227.8240 | Mobile: 510.304.3793 | Fax: 650.227.8199
reve@compositesw.com | www.compositesw.com

 

    Other articles by this author

Discussion:

No comments have been posted yet.

Site Map | Contribute | Privacy Policy | Contact Us | Dashboard Insight © 2008