A Contextual Data Quality Analysis for Credit Risk Management in Financial Institutions
PhD thesis by: Helen Tadesse Moges (2014), KU LEUVEN
1.1 Importance of data quality
The Space Shuttle Challenger broke apart 73 seconds into its flight on Jan- uary 28, 1986 and killed seven astronauts on board. On July 3, 1986 an Iranian commercial passenger jet was shot down by the U.S Navy Cruiser USS Vincennes and killed all the 290 people on the jet. On February 1, 2003, 17 years later after the Challenger explosion, Space Shuttle Columbia exploded during re-entry to earth’s atmosphere and killed all on board. Similar to the above disasters, on September 11, 2001, nineteen terrorists hijacked four commercial passenger jet by passing through airport securities unnoticed and killed more than 2000 people. On the high level scenario, these four disasters are similar that they are responsible for the tragic loss of life and they happened against the intent of their respective organizations. However, a deep analysis to these disasters revealed that information or data quality (DQ) is among the reasons responsible for the happening of the disasters [67, 11, 37, 16].
Incomplete and misleading information are found to be one of the reasons for the Challenger accident [16]. Fisher and Kingma [37] who conducted a thorough analysis of the Vincennes incident indicated that data quality or information quality was a major factor in the USS Vincennes accident. Similarly, the Columbia Accident Investigation Board [16] concluded that the available data about the foam impact was enough to act upon, however they were considered as irrelevant. Finally, the 9/11 Commission [67] identified that relevant information from the National Security Agency and the CIA was not considered to be relevant to make their ways to criminal investigators. Although data quality or information quality is not the only responsible factor for these disasters, it is impossible to have perfect decisions with many examples of flawed data [37].
A more practical example is the death of a pediatric patient because of a misplaced decimal point in the medicine prescription [7] and the health care organization which overpaid $ 4 million per year in claims for patients who were no longer eligible [147]. Similarly, an eye-wear company has incurred one million dollars annually because of lens-grinding reworks which were caused by data errors [144]. Although losses from poor DQ vary, they are measured in the billions of dollars in addition to costs measured in lives lost, employee and customer dissatisfactions [37, 88, 105, 122]. This indicates corporations are losing millions of dollars due to poor DQ [37, 120, 123]. Davenport states, “no one can deny that decisions made based on useless information have cost companies billions of dollars” [23]. Moreover, the magnitude of DQ problems is continuously growing following the exponential increase in the size of databases [87, 99]. This certainly qualifies DQ management as one of the most important business challenges in today’s information based economy.
Unless specified otherwise, this PhD thesis uses data interchangeably with information. Hence, throughout the text, we use DQ (data quality) or IQ (information quality) and DP (data product) or IP (information product) synonymously.
In Section 1.2, we further describe the research context. Section 1.3 describes the research goal and questions that will be addressed in this PhD thesis. In Section 1.4, the research methodology that was used is presented. Section 1.5 indicates the outlines of the entire thesis. Finally, the chapter ends by listing the articles presented in the thesis (see, Section 1.6.)