Data mining and business intelligence (BI) are like our universe—they are constantly expanding by the minute. New models and techniques in BI keep evolving and maturing on a seemingly endless continuum, as best practices in technology and business practices drive innovation and better business alignment. Entrenched in the universe of recent knowledge-management innovations, predictive analytics promises some of the most attractive returns currently associated with BI.
Leaders in both the scientific and business communities are becoming heavily dependant on technology architectures that perform tasks such as complex pattern discovery, stock selection, credit scoring, text mining, biometrics, loyalty and rewards distribution/marketing, and fraud detection. For example, medical, pharmaceutical, and health insurance organizations have been supercharging their analytics in order to portend which types of patients pose the greatest health risks - while integrating health records, lab results, demographic data and prescription histories to develop better models of preventative medicine. They understand exactly how past trends determine future risks and adjust and tune their interventional techniques and proactive care accordingly.
Although, large corporations in all industries have been turning their attention more towards predictive modeling to gain competitive advantage and better assess corporate risk, there is another motivation: Global enterprises have invested enormous amounts of capital in state-of-the-art BI platforms, thus they are seeking new ways that they can leverage existing BI infrastructure and glean additional value from their technology investments. Business intelligence platforms become much more dynamic and valuable when they enable new business processes that revolve around prognostication and prediction.
The global corporate landscape is now littered with enormously large data warehouses, clusters of data marts, operational data stores, OLAP/ROLAP cubes and terabyte and upon terabyte of unstructured data. While much of this information is poised for generic decision support tasks, there still exists enormous opportunities to leverage current BI infrastructure and align it to predictive modeling and forward-looking statistical analysis. Your company’s previous victories in data quality, data optimization and data integration become even more important when you realize that these successes will enable areas of BI that have, until recently, been off of the radar of even the most seasoned IT managers.
In most real-life cases, an enterprise needs a wealth of quality analytical data before it can hope to achieve a good degree of success in comparative analytics, predictive analytics or decision automation. Although I know a handful of industry commentators that like to disagree with this statement to varying degrees, I am sticking to my guns. (To be fair, there are certain analytical situations - gauging performance in specialized business verticals, for instance - where relative statistical variance and sample size will mean that a full-blown data warehousing or BI solution is not a precursor to predictive analysis. In the case of PECOTA scoring [see the second part of this article], such a situation is applicable.)
While an existing operational BI platform and data infrastructure is critical to the future success of predictive BI, many major BI dashboards do not lend themselves to proactive analysis as well as one may surmise. Ensuring that BI dashboards will scale to support various future-facing and forecasting methodologies must be on the implementation team’s critical path. Other important considerations include:
- Technical architecture must lend itself to “quick time to market.” Before deploying, look for areas of risk where predictive intelligence may lag too far behind current BI processes or reporting windows. The creation of data cubes or reports created from predictive models may take longer or shorter (based on their complexity) to create than existing DSS information stores or data marts.
- Decide your strategy, far in advance, on how you will manage the securitization and control over the distribution of highly sensitive results from your most critical modeling efforts.
- A robust BI system should give analysts the ability to adjust their models (at least from a date/time dimension overview) based on real-time changes in the market place.
- Your dashboard and platform vendors should have a concise and clear strategy as to how their tools will support the future generations of knowledge mining, especially that of voice, email, and other sources of generally unstructured data.
- Support for various model exchange formats is critical. Important commercial supported formats must include the Predictive Model Markup Language (PMML) as well as open source languages like R (the de facto standard language among statisticians). With cloud computing making more headway as part of both cost-savings efforts and the “green IT revolution.” more XML-based markup standards will come to fruition.