Just when you thought BI was a maturing industry, along came a bunch of new technologies that have energized the industry by tapping into unmet user demands. One of the most impactful areas of innovation is data discovery – a new analytic approach (think greater data exploration, greater ease of use, and no-fuss integration) that helps business users answer “why” and “what if” questions through self-service analytic applications. Also known for the agility it brings to BI, data discovery has great potential to enhance IT processes around application development, data quality management, and the calculation of corporate metrics. Getting your arms around your own data discovery strategy and planning for these changes will allow you to take advantage of its powerful new capabilities.
The Benefits of Prototype Development – Identifying the True Business Requirements
Business users are flocking to data discovery solutions largely because of the speed and flexibility that they provide. They allow power users (or IT) to quickly unify data from disparate information sources and to freely explore the data with easy-to-use interactive visualizations and/or search interfaces. Drill paths are not pre-defined, so users have greater analytical flexibility and can often answer unanticipated questions. That’s a tremendous benefit because with only a single integration effort, the user can answer all the questions they may have today and in the foreseeable future.
In addition to helping users, data discovery platforms also offer speed advantages to IT and the application development process. By minimizing the up-front data modeling that is required of traditional BI systems, data discovery systems can reduce application development time 50% or more. Empowered by this speed advantage, some organizations are using a data discovery platform to “prototype” – at minimal cost and effort – analytic applications that would otherwise have been created with traditional development tools using data from the data warehouse. Through the prototype, the user gets fast access to the data they need to be productive; at the same time, IT and the business together learn whether the application meets the actual business goals.
At times, applications that result from this technique are vastly different than what was expected going into the project – but are a far better fit for the business. That eliminates much of the costly re-work that would have been incurred with traditional modeling and development techniques. The organization then has the option of leaving the analytic application in the data discovery platform or, if it proves to have standard drill paths that have wide appeal, modeling the data in the data warehouse and delivering the information with traditional reports and dashboards.
Exposing Data Quality Issues
Data discovery platforms are also having an effect on data quality procedures and changing how organizations view “dirty” data. A powerful aspect of some data discovery systems is their ability to reveal the data that is contained in corporate data stores and to reveal hidden relationships between data elements. When integrating data, some data discovery platforms can automatically extract and organize the dimensions, metadata categories, and terms found in unstructured content such as emails and PDFs. Often this information is revealed in some simple-to-view hierarchical lists (or search facets) and graphical elements, such as tag clouds, that give analysts a quick way to size up a situation, grasp the scope of the data at his or her fingertips, and ultimately make valuable discoveries. Data quality teams are using these same views to improve data quality in the following ways:
- Evaluating scope – Organizations can assess whether the right data is being stored in the right system. For example, one company recently ingested their data into a data discovery tool and unexpectedly found highly-sensitive personal health information mixed with unrelated information. The data discovery platform revealed its presence and helped the organization solve a major privacy issue.
- Identifying data inconsistencies – Data discovery platforms also reveal quality issues with specific data items. For instance, a quick scan of the system’s search facets may reveal that references to “Cincinnati,” “Cincinnatti” and “Sincinnati” are all in the database. Seeing this, the data quality team can initiate a process to clean the source data.
Enabling the Business to Work with Dirty Data
Data discovery platforms don’t just expose dirty data to IT, they also help business users get greater value from less-than-perfect data. In some departments, such as sales, marketing, and customer service, working with good information today is better than working with perfect information next week. To help the business user react quickly, IT can use the data discovery platform to unify the data and present the data “as is.” Because the data discovery platform exposes data irregularities in its interface, an analyst can use the graphical and search interfaces to work around those inconsistencies (e.g., by selecting all variations of “Cincinnati”) and still complete their analysis.
Calculations of Corporate Metrics
Another area where data discovery platforms are changing IT procedures is in the calculation of company-wide metrics. In companies with diverse product lines, metrics such as on-time delivery are sometimes calculated inconsistently across business units and it’s too difficult or too costly to impose a standard across the company. With their added agility and ease use, however, data discovery platforms are making it practical to consolidate information from different parts of an organization and impose a corporate standard for various corporate metrics.
Recommendations: Engage the Business to Benefit from Data Discovery
Data discovery is here to stay and is poised to take on a greater role in corporate decision making in the coming years. Up until this point, data discovery solutions have largely been brought into the enterprise by business users demanding greater speed, ease of use, and visibility. It’s clear, however, that these same characteristics are also enabling IT organizations to enhance many established BI and IT processes. Since data discovery benefits both IT and the business, it’s important that each group has a voice in the evaluation and selection of a data discovery platform.
A spectrum of platforms, from simple tools (think Excel add-on) suited for an individual to heartier tools suited for Fortune 500 enterprises, are available and have given business users the option to break from IT and go their own way, or to work with IT to select an enterprise standard that meets the needs of both the enterprise and the individual. Of course, a proliferation of tools leads to more support issues, higher software costs, and other challenges. IT organizations would be wise to get in front of this wave and begin collaborating more closely with business users on technology selection or they risk having as many data discovery tools within the organization as smartphone models. Only close collaboration between the parties will ensure that both the business and IT benefits are realized.
John Joseph is director of product marketing at Endeca Technologies, Inc. jjoseph@endeca.com, www.endeca.com