There is a deep dark challenge that frustrates nearly every CIO and it’s this: there is highly useful data that users know is in the organization and they are clamoring to access it – but they can’t. Finding ways to make this data available has proven nearly impossible because the information comes from multiple systems and combining it has proven to be time consuming, expensive, and fraught with peril. Recent discussions prove this to be true:
- One CIO faced with a large analytics backlog and users demanding better insight found his ERP provider unable to satisfy the demands through its Business Intelligence (BI) tools.
- In order to beat the problem, another CIO estimated that he would need 11 data marts and 18 months to build and deploy a new solution – with the estimated cost of almost $3 million.
Whether the economy is going up or down, companies know they need better information to succeed. The executives who are responding to this are challenged with finding the insightful data that they know exists and that will give them a competitive or cost advantage. They’re asking for:
- Better insight into customers and buying patterns from data in numerous order management systems.
- Identifying of the root cause of warranty claims and relating them back to manufacturing processes or supplier-based issues.
- Optimizing designs in engineering to maximize supplier relationships and part costs.
- Understanding the cost of quality in manufacturing.
Even many successful enterprise data warehouse (EDW) or data mart implementations have proven to be too limited as result of the nature of the underlying source system deployments. The causes:
- Too many data sources
- Disparate data types
- Lack of integration
- Limited analytics
- Messy data and missing metadata
Why make this point? Repeatedly organizations speak of having to perform brute force data acquisition and integration – locating data and working with spreadsheets to get some visibility to the business. It would also appear that the costs for basic reporting, achieving roll-ups, or business reporting are staggering and while they are hidden from the IT budget, they are visible to the P&L.
A more comprehensive strategy often means an extensive data modeling and integration exercise – and the costs can once again be staggering. As much as 70% of data integration involves the synchronization of information across disparate applications. Here the challenge is managing the quality, standardization, and semantic meanings as the information flows between applications and up through the layers of the organization. Additionally, experience shows that in many cases up to 30% of project costs can be associated with data prep and clean up.
For many CIOs, this means hard work, frustration, unplanned costs, and too often limited success. It doesn’t have to be that way.
Accessing All Types of Data
One of the underlying challenges of the data modeling and integration exercise mentioned above is that information access has traditionally required a unified data model. This notion is grounded in the experience of many organizations with data warehouses and business intelligence, where information consolidation into a well-defined data model must occur before any reports can be created.
For a company that does not have a grasp on information management, applications and file servers often are littered with old information about obsolete topics by employees, both current and former, who each used a different classification system when storing data. This is true of almost any organization, no matter what the size. What may surprise many of these organizations is the richness of the data they already have.
Today’s enterprise data is a mix of structured, unstructured, and semi-structured content – and users need access to it all. For example, consider a warranty analyst who wants to find the root cause of a product failure. To make an informed decision, she needs to query:
- Structured content: Procurement history and bills of material, as well as sales transactions stored in data marts, data warehouses, and relational databases that sit beneath packaged applications such as ERP, CRM, and PLM.
- Unstructured content: Call logs, PDFs of project debriefings notes, and field notes from packaged systems stored in file servers, Content Management Systems, and Knowledge Management Systems.
- Semi-structured content: Documents, emails, web pages, wikis, and live information services marked up as XML, and crawled from the web and internal file systems, or generated by alerts. In fact, enterprise content is often semi-structured, falling on a continuum between structured and unstructured. For example, emails are semi-structured; they include unstructured text, as well as irregular structure from fields, tags, and even send/receive histories.
The Advent of a Flexible Data Model
What’s required for complete information access is a flexible data model, based on the premise that each record is effectively self-describing, allowing structure and hierarchy to be easily preserved. As with XML, metadata immediately accompanies the data it describes in a flexible data model, rather than being captive in the column headers of a table. The performance implications of this table-free approach are mitigated by a powerful indexing approach where every field, including long-form text, and every value it contains, including all indexed terms, becomes a dimension that slices across the data.
It eliminates the need for an overarching schema in the data model, which allows records to change at will. Since each record is just a collection of attribute-value pairs, each record can gain and lose attributes and values without disturbing any of the other records or violating any higher organization. This flexibility allows developers to unify heterogeneous, changing data and content from multiple sources without the headaches and expense of traditional data modeling.