Objective review and data quality goals of data models
Did you ever ask yourself which score your data model would achieve? Could you imagine 90%, 95% or even 100% across 10 categories of objective criteria?
No?
Yes?
Either way, if you answered with “no” or “yes”, recommend using something to test the quality of your data model(s). For years there have been methods to test and ensure quality in software development, like ISTQB, IEEE, RUP, ITIL, COBIT and many more. In data warehouse projects I observed test methods testing everything: loading processes (ETL), data quality, organizational processes, security, …
But data models? Never! But why?
Unfortunately, it seems that the task or rather profession of data modeling, still hasn’t reached the level of importance it deserves. When I look around, I hear people saying: “everyone can model data: that’s easy drawing some tables in Power Point or Visio.” Or even worse: data modeling is abstract, time consuming or even unnecessary.
But it’s not. Data modeling is your data landscape, a tool for communication about facts with domain experts! It’s an art to stick all the parts together: facts, value chain, requirements, source systems, reports. It’s at the heart of your data warehouse. Together with the data architecture, it is central to success or failure of the long-term project: the data warehouse.
In my last blog post The Data Doctrine, I mentioned the doctrine Value Stable Data Structures Preceding Stable Code. If there is so much testing in each part of the data warehouse development, as mentioned above, and no testing on data models at all, the doctrine is violated. In my recent projects, I focused on testing data structures and keeping them stable. Guess what: testing of loading processes, virtual data marts and data itself became much easier than before.
But how to determine the quality of a data model? And how to test the data model? One tool to increase quality of data models is Steve Hobermans Data Model Scorcard® [2]. I use it in my own projects as well as a tool to do objective and independent data model reviews. What I like using the Scorecard is that it is divided into 10 sections to validate a data model. And each and every one of these sections helps me building or reviewing a better data model.
The 10 sections of the Scorecard [1] are:
- Correctness - Does the model meet the requirements?
- Completeness - Is the model complete and yet with no gold plating?
- Scheme - Does the model match its scheme (relational, dimensional, ensemble or NoSQL - conceptual, logical or physical)?
- Structure - Is the data model consistent, of integrity, and does it follow basic data modeling rules?
- Abstraction - Does the data model have the right balance between generalization and specialization?
- Standards - Does the data model follow (available) naming standards?
- Readability - Is the data model mapped in a readable layout?
- Definitions - Are correct, complete and clear definitions for entities and attributes available?
- Consistency - Does the data model use the same structure and terminology in the enterprise (data model)?
- Data - Is data profiling done and do attributes and rules match reality?
With the Data Model Scorcard® you can answer these questions from the outset. The following picture shows a real scorecard I’ve done for customer’s data model review. And it was a good result for the first review. To achieve 90% or higher, there’s a lot of work to do. If you have any questions according the Data Model Scorcard® feel free to drop a comment below or sent me a message.
So long,
Dirk
[1] S. Hoberman, Data Modeling Scorecard, Technics Publication, LLC, 2015.
[2] Data Model Scorcard® reference: Steve Hoberman & Associates, LLC hereby grants to companies a non-exclusive royalty free limited use license to use the Data Model Scorecard solely for internal data modeling improvement purposes. The name “Steve Hoberman & Associates, LLC” and the website www.stevehoberman.com must appear on every document referencing the Data Model Scorecard. Companies have no right to sublicense the Data Model Scorecard and nor right to use the Data Model Scorecard for any purpose outside of company’s business.