Data Warehouse Integration Example
Example 1: Implementing a Data Warehouse
A data warehouse is the cornerstone component of any BI
environment. It is the repository of non-volatile, historical snapshots
of enterprise data used for all sorts of analytical and reporting
purposes. The data warehouse does not store all data – only the
data needed for analytical purposes. Therefore it should not be
thought of as an operational data archiving mechanism.
Due to performance and other reasons, the operational systems
generally do not hold more than a year’s worth of data in their
databases. The rest is generally archived. Unlike operational
systems, the data warehouse has no time limit on the data it stores.
Many organizations store 2, 5, even 10 or more years of snapshot
data in their data warehouses.
For all data warehouse implementations, data must be physically
extracted from the operational environment. Once extracted, the
heavy lifting of transforming the data into consistent, standardized,
and cleansed snapshots is performed by physical data
consolidation techniques using the ETL technologies. These
snapshots are then loaded into the data warehouse schema and
made available for all forms of analytics and times-series reporting.
Data Mart Integration Examples
Data marts can be implemented virtually or physically, with each
approach having distinct advantages and disadvantages. It is
recommended that BI implementers first determine if a virtual mart
can satisfy their users’ BI needs before building a physical mart.
Example 2: When to Use Virtual Data Marts
As mentioned earlier, the CIF is a logical or conceptual
architecture. However, many people go about implementing it as
if it were a physical architecture. That is, they create only physical
marts by extracting data from the data warehouse source,
reformatting it, and loading it into its own data mart environment.
While this can satisfy BI needs, it is often unnecessary and inefficient.
A virtual mart may be created in one of two ways. The first method
is to use a BI tool to create required views of the data. There are
many BI tools for which a view of data is a suitable substitute for a physical data mart. For example, Business Objects is long known for
its utilization of “universes” of data.
But this form of virtual data mart can also be problematic – first, it
can be cumbersome when multiple tiers of views are created. Most
BI tools do not have sufficient view documentation and business
users can make erroneous assumptions about what data is actually
being analyzed. Second, these views generally work only with the
BI tool creating them. That is, they are proprietary to the BI tool and
are not available for use by other BI tools.
The second and often better way of creating needed views is via
virtual data federation techniques using EII technology. EII
technology allows enterprises to aggregate and present data
warehouse data to BI applications via real-time access.
There are several advantages to virtual data mart creation:
- No movement of data – The problems with bulk movement of
data into physical data marts simply do not exist with virtual
marts. Data can’t be lost or its delivery delayed.
- Real-time access to the data – Virtual data marts allow realtime
access to the historical data as long as the source data
warehouse is accessible. This is particularly important for data
warehouses that are updated frequently during the day.
- Reduced costs – Other than the EII technology, there is no
need for additional hardware or software which, in turn,
means a reduction in overall costs.
- Easy to maintain – Tearing down a view and rebuilding it is far
easier to do than unloading a physical data mart,
reconstructing its schema, and then reloading the data.
Therefore maintenance is simpler.
- Security reasons – For some organizations, there may be
privacy issues or compliance regulations that prevent
proliferation of sensitive data.
In addition to these advantages for virtual data marts, EII
technology brings its own specific advantages:
- Rapid and iterative development – For situations in which the
business users are unsure of their requirements or their
requirements are volatile, a virtual data mart using EII
technology makes sense. The ability to quickly implement the
virtual mart, get the users’ input, and then swiftly recreate the view is an inexpensive and beneficial way of gaining insight
into the necessary requirements. It ensures that the
application will satisfy the user community.
- Reduced maintenance – Just as the time for development is
reduced, so is the time for maintenance. In many cases, IT
may find that the EII technology is easy enough to use that
they no longer need to perform the maintenance; it can be
performed by the users themselves.
- Masked data warehouse complexity – The data schema for
the data warehouse is usually quite complex and rather
mystifying to the average business user. EII technology
eliminates the need for users to understand this data schema;
instead, a data view is customized into the way users want to
consume data.
However, there are several concerns of significance when
considering virtual data marts.
- Performance must be monitored – Virtual data marts may
appear to be separate entities to the business user but they
are indeed all using the same source of data – the data
warehouse. With multiple BI applications running and
multiple users accessing the data, the data warehouse must
be constantly monitored for performance. If a group of users
launches long-running, very complex queries that require
massive amounts of data, the performance of the entire
environment could be compromised. A physical data mart
may be needed for these users.
- Support for proprietary BI technologies – If the EII technology
cannot support a proprietary BI technology, then the IT
implementers have no choice but to create a separate,
physical data mart for that technology. Standards such as
SOAP, ODBC, and JDBC mitigate this problem however.