And there it was again, the lively discussion about the use of a Link Satellite. The argumentation, whether yes or no, went round in circles.

This time, the place of the debate was the DVEE Consortium Summit in Rotterdam, the annual meeting of Data Vault & Ensemble Enthusiasts (DVEE) and a worldwide event for exchanging ideas with experts in the field.

The discussion, as is often the case, included many and varied arguments for - and against assigning a Satellite to a Link in Data Vault.

It would be helpful to place these arguments in the context of a more formal decision-making guide, so that modelers can clearly see when a certain decision makes the most sense.

In the first part of this two-part article, I look at the background of the discussion from Diego Pasión's point of view, and present his approach on tackling this modeling decision and capturing the various perspectives in a decision-making guide. The second part deals with the bitemporal aspects related to these decisions.

What's the background?

Data Vault is a physical data modeling method with the aim of storing data in an efficient, flexible, and technically complete historized way. In a lot of cases, the focus is very much on the physical implementation, which can make it difficult to discuss the business causes of problems that surface during the modeling of a Data Vault, let alone finding a suitable solution.

Satellites on Links are technically possible, and are traditionally described over and over in the Data Vault literature, e.g. to include the context of a Unit of Work (UOW).

A solution and argumentation assistance

As is the case for many areas in the world of data, not only do ideas and methods evolve over time, but approaches do so as well.

This is also the case for Diego Pasión: he —the well-known coach of the DMCE team at FastChangeCo— is now convinced that a much better Data Vault model can be designed with a business/domain data model than without one. After all, the modeling of the information (business) is separate from the physical implementation.

This thinking forms guides him when discussions about the pros and cons of Link Satellites become heated, or when he explains why there should be no Link Satellites ‘per se.’

In situations such as these, Diego likes to use a simple example to show how a Data Vault model can be derived from domain-oriented data modeling. The results of this example provide a guideline whether Satellites should be modeled on Links, or not.

Diego starts with a domain-oriented data model and explains that it shows the following information: “An ‘employee’ has to ‘work in’ a ‘department’. Or the other way around: Many ’employees’ may ‘work in’ a ‘department.”

“The ‘employee’ and ‘department’ entities have attributes” he continues, “which describes an instance of each of them. The reason for the relationship between ‘employee’ and ‘department’ is that employees works in a department.”

Figure 1 - Business model, employee - departmentFigure 1 - Business model, employee - department

“If you start from a domain-oriented data model (like the employee - department data model show before), which is generally recommended,” says Diego, “then during the instantiation of a logical data model (LDM) into a physical Data Vault model (DV) an entity ‘automatically’ becomes a Hub and a Satellite in the first step, and a relationship becomes a Link without a Satellite.”

The following simple rules therefore apply in the ‘Diego guideline’:

  • Entity attribute(s) (identifying - How to get one and only one record):

    Becomes the business key in the Hub
  • Entity attribute(s) (describe, measure - What is important about an entity?): 

    Becomes context attributes in the Satellite
  • Relationship (What is the reason for a relationship between two entities?): 

    Becomes a Link

 “From a business point of view, in terms of an LDM,” Diego continues, “there can never be a Link Satellite because relationships themselves never have descriptive attributes.

The relationship between ‘employee' and ‘department’ would be updated in the ‘employee’ entity if the employee changes department.”

Diego looks around as he presents this example to the audience.

“If a relationship in the LDM had descriptive context, then the data modeler would model an associative entity in the LDM. This associative entity would incorporate the attributes that describe the relationship, but, according to the guideline this would simply become a Hub and Satellite in the physical Data Vault model.”

“Diego, can you show us an example of this?”

Diego creates the following DV model based on the LDM shown above following his guideline and says: “A Hub and a Satellite are physically modeled from the entities ‘employee’ and ‘department.’ The relation ‘works in’ becomes the Link between the two Hubs.”

Figure 2 - Data Vault data model, employee - link - department

Figure 2 - Data Vault data model, employee - link - department

Separation of concerns

“But what happens if the employee changes department?” Diego is asked.

Diego considers this for a moment, and then continues to explain: “An LDM does not ‘take care’ of technical historization, but this is still of interest of course. This is why, at implementation level, Data Vault supports technical historization for the descriptive context in Satellites as a fundamental tenet. Otherwise, in a physical data model that is derived ‘one-to-one’ from an LDM, there would be no historization, only updates.

Whether this is okay depends on the application that builds on the physical data model.”

Diego says that he is generally in favour of looking at business and temporal aspects separately: “The moment technical historization comes into the discussion, things look different. Because in a one-to-many relationship, a change must also be documented from a technical perspective. For example, when employee Sophie moves from department A to department B.

”Diego thinks about this for a moment, then he says: “This is best resolved by applying standard patterns in the physical model, and remove the discussion about the technical historization from the logical level.”

“In general, a ‘Load Date Timestamp’ column (LDTS, or, as the Inscription Timestamp in our data models) is used in the Data Vault Satellites purely for the technical historization/versioning of the data,” Diego explains.

“This column is part of the physical data model, the instantiation of the LDM, as this is where versioning (instead of updating) of the data is required to 'preserve' previous states of the data.”

“To stay with the example,” Diego continues, “if the employee Sophie now moves from department A to department B, this event adds another data record to the Link. In the data model shown above, the relationship would no longer be unique! As it is not clear from the data in which department Sophie is currently working.”

The solution preferred by Diego for this scenario (a one-to-many relationship) is adding an (end-dating) Satellite on the Link, which only task is to look after the technical historization of the Link.

With the help of a so-called driving key in the Link, the one-to-many changes in the Link can and are documented historically correctly by the Satellite.

Figure 3 - Data Vault data model, employees - ‘one-to-many’-Link - department

Figure 3 - Data Vault data model, employees - ‘one-to-many’-Link - department

And descriptive attributes on the relationship?

“The physical Data Vault data model would change again if the information model, the LDM, changes,” notes Diego.

“As already mentioned, these can be further descriptive attributes on the relationship. An example would be if the relationship in the above data model between ‘Employee’ and ‘Department’ contains the following descriptive attributes:

Sophie's start date in a new department B is three months in the future. Sophie will assume the role of data modeler there for three months.”

How Diego deals with this and how his guidelines for this are expanded is the subject of the second part of this blog article.

So long,
Dirk

 

 

2 comments

    • Hi anonymous,

      the DVEE actually has nothing to do with the Data Vault 2.0 Standard Committee. Nevertheless, all forms and variants of Data Vault are discussed there. In the article itself, I refer to Data Vault in general, regardless of a specific variant. The procedure described applies to all Data Vault types.

      I hope this clears up any possible confusion. 

      Kind regards,
      Dirk

Leave your comment

In reply to Some User

This form is protected by Aimy Captcha-Less Form Guard