Unstructured data 101:
Due to the breadth of unstructured data solutions, whether text or search based, organizations are hard pressed to identify which solutions can offer them the right functionality to help them address their business information pains. However, with a general understanding of the options available, organizations can develop solutions tied to ROI and can increase their strategic initiatives within the organization.
The first several parts of this series discussed unstructured data, text mining, how they are used within the organization, and the business and technological factors management should consider when looking to implement an unstructured data solution. The next step for organizations is to identify what options are available and how these options best meet business requirements.
Three options for organizations
Unstructured data use within business intelligence applications will continue to expand until it becomes a regular feature within BI. Although this seems to be one of the key discussions within the business intelligence world, the use of unstructured data is actually broader. Aside from BI search, BI solutions can now leverage existing text mining or text analytics features or embed best of breed solution functionality into their front end data visualization tools. The questions organizations should be asking are: Which solution will best meet the needs of the organization? What issue is the organization trying to address? And what currently exists (i.e. technical architecture, scorecards, etc.)? An organization with a strong BI infrastructure might want end users to have more access to the right information, or may want to broaden access of published reports to the entire organization. Another organization may seek to develop predictors for customer behavior. Once management has identified the driving factors behind the initiative they can begin building a solution that includes one of the options identified.
The following three areas provide a general breakdown of the types of solutions businesses should consider as well as general benefits and challenges associated with these options. Although in-depth technical challenges will not be discussed, the items listed will give organizations a general overview to identify what key issues they should explore further when considering these solutions. Additionally, the use of unstructured data within BI based analytics, although available, is quite young in its phase of adoption. This means that organizations can explore these options but the diversity of use within BI provides a limited amount of benchmarking opportunities for organizations that wish to reference current uses.
BI search
BI search tools allow organizations to embed search into their current BI applications. In many cases, BI is relegated to super users, with in-depth knowledge of where key information resides left in the hands of IT professionals. The problem with this approach is that the wealth of information that could be given to decision makers throughout the organization very rarely finds its way beyond the defined end user community. With BI search, Google type searches allow end users to access the information they are looking for in a way that matches their comfort and ease of use due to familiarity, based on the way they currently use the Internet.
Currently, many BI vendors have embedded search within their applications by partnering with search vendors such as FAST or Google. This offers organizations an easy way to use search within current business intelligence or performance management applications. However, search has been a key component to other solutions, such as enterprise content management for many years. If organizations want to bridge the gap between BI and other solutions, an expansion of current systems to include BI or BPM based information might be a better approach to enable end users access to a wider range of data that may not be accessible otherwise.
Business intelligence enhancements
The access of unstructured data within current applications includes embedding BI search. But moving beyond search involves the utilization of text mining and analytics to identify patterns and to perform predictive analytics based on unstructured – specifically text based – data. Organizations should consider whether they should access or embed this functionality within their current applications of BI. For instance, if an organization is using a dashboard to identify sales patterns based on geographical locations and specifically within each store or by sales staff, and link that to customer satisfaction, the next step could be to identify public perception of the products being monitored. In addition, the organization may choose to leverage internal and external information found in text forms, on the Internet, and on other company sites to identify trends and to develop marketing campaigns based on the information gathered. These types of initiatives will be the difference between organizations that have a nice market share and those organizations that move beyond traditional marketing campaigns to provide enhanced customer experiences and to increase market penetration. Product differentiation is becoming more difficult. The way organizations create and sustain their competitive advantage and develop key differentiators is by using technology to enable organizations to give customers a unique and special experience.
Text Mining
Text mining applications identify patterns and perform analyses to identify areas that would otherwise not be accessible to the organization. Alternatively, text mining identifies patterns before they become business problems. Fraud is a good example as unnatural patterns are identified within the data sets to recognize suspicious patterns or behavior enabling organizations to save money and to lower bad debt costs. Another area where this can be applied is customer satisfaction. Organizations can identify discrepancies in sales, customer retention, and what customers are actually calling the call center for. This includes what customers are actually complaining about which can translate into why sales are dropping, why customer churn is increasing, etc.
Text mining can be embedded within BI applications or organizations can use them separately to apply more advanced forms of unstructured data analysis. SPSS provides an example of a vendor offering advanced predictive text analytics to its users. Also, in relation to business intelligence solutions, Information Builders embeds SPSS’s technology to provide advanced text mining capabilities to organizations. Businesses deploying other applications can use this as a benchmark in terms of how embedded text analytics can be used; in this case with the data visualization portion (i.e. reports) pulling analytics based information to provide user friendly access to a centralized portal enabling quick access.
General advantages
Generally, organizations that have a business intelligence or performance management application understand the advantages associated with its deployment. Enhancing these solutions to include search and/or text analysis capabilities may be seen as only an extension of the current solution or infrastructure; however, some advantages go beyond the current applications and can extend towards the whole organization. Two key advantages include corporate wide information access and the ability to run a business proactively.
The concept of bringing BI “to the masses” is constantly discussed by vendors, but as yet rarely executed. Search demonstrably changes this. The use of search to get to desired information quickly bridges the gap between BI being an application for the few, to a widely used information access point allowing end users to find information they don’t know exists. Aside from saving time, with information being readily available decision making can be done more efficiently and effectively.
Analytics allows organizations to discover patterns that are not obvious to users. Organizations can identify sales trends, the effects of marketing campaigns, supply chain distribution success, etc. by using these applications. This creates a forward looking approach as the identification of potential successes and failures are identified before or as they occur as opposed to the traditional approach that involves reacting to changes in performance after failures have occurred.
General Challenges
Organizations may arm themselves with information about what they want to implement a solution and even about how to do it, but they may not understand or analyze potential challenges that exist. Since BI search and BI embedded applications have been discussed at length, the focus of challenges will remain mostly in the realm of text mining applications. Obviously a crossover exists in many cases and challenges mentioned here can be applied to other areas within BI as well. Some challenges of implementing text mining solutions include the identification of the appropriate information (to identify what is actually useful), the capturing of context, and the future collection of information.
Because unstructured data is so diverse and expansive it may be difficult to define what data is required. How do organizations identify what potential patterns they are looking for? On the one hand they may be looking for information to help solve a business problem, on the other hand, unknown patterns may be lurking that could help an organization identify potential fraud, ways to increase sales, identify new business opportunities, and develop successful customer experience programs. The issue becomes, how does an organization identify what opportunities to pursue and which ones they are potentially overlooking. How do organizations use text analytics to identify unknown patterns? This means developing a tool to create intelligent text analytics that can identify unknown instances within the organization. Solutions do not provide these answers automatically. Although industry solutions exist, addressing these issues is not intuitive unless an organization defines the specific business rules that are required to access the necessary data.
Even though the proper information is collected, the next challenge involves identifying the context that surrounds the data. Although words or concepts might be identified, making sure that the proper contextual information is collected can be a challenge. Collecting like text strings captures what is requested but may not capture related information. This runs the risk of unrelated results that can lead to wrong assumptions being made.
Text mining initiatives may be shortsighted as organizations look for ways to answer questions to immediate business issues. Consequently, longer term initiatives may require looking at larger data sets or more sources than originally thought. The key challenge for organizations is to identify potential future uses so that expanding text mining applications become a natural extension of the current application.
Conclusion
Organizations looking to implement an unstructured data solution within their BI or BPM environment should evaluate which solution best meets their current and future business requirements. Whether the answer involves BI search, analytics embedded within BI or BPM, or a best of breed text mining or analytics application, management should consider both the short term and long term requirements that align the use of technology with their business issues.
(Copyright 2007 - Dashboard Insight - All rights reserved.)
About the Author
Lyndsay Wise is a senior research analyst for the business intelligence and business performance management space. For more than seven years, she has assisted clients in business systems analysis, software selection and implementation of enterprise applications. She is a monthly columnist for DMReview and writes reviews of leading technologies, products and vendors in business intelligence, data integration, business performance management and customer data integration.