Example 3: When to use Physical Data Marts
There are some matters to consider when determining whether a
dependent physical data mart is appropriate.
- A change in technology – The data warehouse is best stored
in technology that can handle massive amounts of data with
fast load and unload times. These are usually relational
databases or some form of data warehouse appliance.
Some BI technologies may use their own form of proprietary technology or require a technology different from that used
for the data warehouse. If this is the case, then you have no
choice but to extract a subset of data warehouse data for
the application and load it into a separate data mart for
usage by the BI technology.
- Performance reasons – Performance of the BI application
can be better if a physically distinct data mart is set up for it.
For example, heavy data mining analytics can impact the
lighter reporting requests if both are performed within the
data warehouse environment. It makes sense then to
replicate the data mining data into its own environment
where performance can be tailored to the data miners’
specific needs.
- Security reasons – For some organizations, there may be a
need to create a separate data mart or warehouse that
contains sensitive data or algorithms. Privacy issues, security
worries, or in some cases, even compliance regulations
mandate a separate and protected environment.
There are disadvantages to building separate physical data marts
as well:
- Redundancy of data – Building a separate physical data
mart means more storage, more ETL processing, redundancy
of efforts, redundancy of the data, and higher costs. Also,
anytime you have to re-extract data from the data
warehouse, there is always the chance that the data mart
data will get out of sync with the data warehouse data.
There should be an established reason or ROI before this
process is performed.
- More IT maintenance and costs – Obviously, creation of a
physical data mart means extra work for the BI team. Any
time that the data mart design changes (which it certainly
will), the data has to be unloaded, the schema changed,
and the data reloaded – not a trivial task. In addition, IT must
maintain backup/recovery schemes and archiving media for
both the data warehouse and physical data marts. Most
physical marts require additional hardware and software
licenses, which add to the overall costs of the BI environment.
- Potential inconsistency – Many times, the dependent data
mart is created by a line-of-business IT group or by
technologically savvy business users themselves. These groups are generally not under the same constraints as the
data warehouse implementers are, and therefore, they may
not use the same nomenclature, algorithms, or even
formatting as was used in the data warehouse. This often
causes inconsistency between their analyses and those
performed by data warehouse analysts.
- Bulk movement of data – Because the data mart is physically
distinct from the data warehouse, the data must be
transferred via the network to the mart. Any time data must
be physically moved, there exists the possibility of problems,
e.g., the delivery can be delayed or data may be only
partially delivered. Secondly, if the data warehouse is
updated more than once a day, it may not be possible to
reload the physical data marts with each update.
Data Warehouse / Data Mart Archiving and
Extension Integration Examples
Example 4: Combine Data Warehouse (or Data Mart)
Data with Archived Data Warehouse Data
Many enterprises now archive older, infrequently used snapshots of
data warehouse data to other media. Because the data is not
contiguous with the data warehouse, it may be difficult to satisfy
one-off queries that span the two data forms. The easiest and
fastest way to combine these sets of data is to use virtual data
federation technology. Rather than physically (re)loading archived
data into the data warehouse, the BI team can quickly create a
virtual environment with EII technology. When the query completes,
the virtual combination can be just as quickly dismantled.
The advantages and drawbacks to this approach are the same as
for a virtual data mart (Example 2). Fortunately there is one other
advantage for this virtual scenario: usually very few users require the
ability to span long timeframes of data warehouse data so their
queries will usually not affect the performance of other users.
Example 5: Combine Data from Multiple Data
Warehouses
Another scenario all too common in enterprises today is the
existence of multiple data warehouses. The reasons for this are varied – mergers and acquisitions, political or cultural constraints,
and security or privacy concerns to name just a few. Inevitably, a
request will come up that requires data from these warehouses.
Without having to overhaul the entire BI environment, such requests
can often be satisfied by using virtual data federation techniques to
create a virtual connection between the physical stores.
The advantages are similar to those for a virtual data mart – speed,
low cost, no movement of data, and ease of maintenance.
However, there is one significant concern to resolve – it is doubtful
that the various data warehouses were built using an overarching
standard data model. Therefore, they most likely will not have used
consistent formats, names, data quality activities, calculations, etc.,
during their data consolidation processes. If a virtual connection is
to be made between these repositories, then these disparities must
be abstracted or easily transformed.
Operational Data Store Integration Examples
The operational data store (ODS) is a sister to the data warehouse,
similar in some aspects, but also similar to operational systems in others.
Like the data warehouse, the ODS contains integrated, cleansed data
for usage in reporting and analytics. But there the similarities end. The
ODS contains current – or as current as the technology will allow – data
that is updated rather than stored as historical snapshots. Little history is
stored in the ODS which, along with the update status of its data, makes
it similar to operational systems.
The ODS is often used to create management reports on operations or
to perform analyses on current events like campaigns or other sales
events, channel activities, inventory statuses, daily sales, etc.