When the correlation between poor data quality and poor business performance isn’t measured in a tangible way, data quality can be misperceived as a technical activity performed for the sake of the data, instead of an enterprise-wide initiative performed to provide data-driven solutions for business problems.
The common mistake is taking a data-myopic approach to data quality metrics, i.e., creating metrics that reflect the quality of the data in isolation. Without understanding how the organization is using its data, and how data quality affects business results, data cannot be called a corporate asset. Data is an asset only if the organization can qualify and quantify its value by connecting its usage to business objectives.
This whitepaper will examine these historical challenges and how the relationship between data quality and data governance can overcome them. This whitepaper is structured into the following three sections:
- Defining Data Quality – Examines the two most prevalent perspectives on defining data quality, since how data quality is defined has a significant impact on how data quality is measured.
- The Role of Data Governance in Data Quality – Examines how data governance provides the framework for a proactive data quality program, ensuring that data is of sufficient quality to meet the current and evolving business needs of the organization.
- The Role of Data Quality Monitoring in Data Governance – Examines how compliance metrics associated with data governance policies align data quality with business insight, providing the historically missing link between data quality and business performance.
Defining Data Quality
Historically, there have been two perspectives on defining data quality:
- Real-world alignment – Reflects the perspective of the data provider
- Fitness for the purpose of use – Reflects the perspective of the data consumer
How data quality is defined has a significant impact on how data quality is measured. Therefore, in the sections below, we will examine these two perspectives and how they relate to data quality metrics.
Real-World Alignment: The Danger of Data Myopia
Whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.
The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world. However, these abstract descriptions can never be perfected because there is always a digital distance between data and the constantly changing real world that data attempts to describe.
The inconvenient truth is that the real world is not the same thing as the digital worlds captured within the organization’s databases. And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality—when the organization’s data quality efforts are focused on minimizing the digital distance between data and the real world, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.
With a data-myopic focus, data quality can be misperceived as an activity performed for the sake of data. When, in fact, data quality is an activity performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and driving optimal corporate performance.
However, even if we could create and maintain perfect real-world alignment, what value does high quality data possess independent of its use? Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements. In other words, high quality data should be fit to serve as the basis for every possible use. Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization’s many data consumers.
Providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM). Although these initiatives can provide significant business value to the organization, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.
A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.
In other words, real-world alignment does not necessarily guarantee business-world alignment.
So if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice. However, as we examine in the next section, fitness for the purpose of use has its challenges as well.
Fitness for the Purpose of Use: The Challenge of Business Relativity
Shown above is M.C. Escher’s famous 1953 lithograph Relativity, in which we observe multiple conflicting perspectives of reality. However, from the individual perspective of each person within the lithograph, everything must appear normal, since they are all casually going about their daily activities.
This is an apt analogy for the multiple business perspectives on data quality within most organizations.
Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined as fitness for the purpose of use—the eyes of the user. However, most data has both multiple uses and multiple users. Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users. These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.
The user (i.e., data consumer) perspective establishes a relative business context for data quality.
However, whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition of data quality must contend with the daunting challenge of business relativity—most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the conflicting Escher-like reality of the organization.
This challenge inherent in the data consumer perspective on data quality often contributes to the data silo problem, the bane of successful enterprise data management prevalent in most organizations, where each data consumer maintains their own data silo, customized to be fit for the purpose of their own use. Organizational culture and politics also play a significant factor since data consumers legitimately fear that losing their data silos would revert the organization to a data provider perspective on data quality.
Data Quality Metrics
Therefore, how data quality is defined has a significant impact on how data quality is measured. When the correlation between poor data quality and poor business performance isn’t measured in a tangible way, data quality can be misperceived as a technical activity performed for the sake of the data, instead of an enterprise-wide initiative performed to provide data-driven solutions for specific business problems.
Business-relevant metrics align data quality with business objectives and measurable outcomes. There are many data quality metrics—alternatively referred to as data quality dimensions. Some data quality metrics are more closely associated with real-world alignment and others are more closely associated with fitness for the purpose of use. However, most metrics can be applied to both data quality definitions.
In her great book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray provides a comprehensive list of data quality metrics, which include the following:
- Timeliness and Availability – A measure of the degree to which data are current and available for use as specified and in the time frame in which they are expected.
- Data Coverage – A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest.
- Duplication – A measure of unwanted duplication existing within or across systems for a particular field, record, or data set.
- Presentation Quality – A measure of how information is presented to and collected from those who utilize it. Format and appearance support appropriate use of the information.
- Perception, Relevance, and Trust – A measure of the perception of and confidence in the quality of data, i.e., the importance, value, and relevance of the data to business needs.
Although there are many additional data quality metrics (as well as alternative definitions for them), perhaps the two most common data quality metrics are Completeness and Accuracy.