Does the Load Date Timestamp (LDTS) data element in the hub, link, or satellite capture the timestamp of the batch, or rather the transaction timestamp at which the data originated in the operational system?
In many data solutions projects implemented by the Data Management Center of Excellence (DMCE) team at FastChangeCo™, this question comes up again and again.The first post in the Temporal Data series is a revision of an old blog post of mine. The reasons for digging out the old post again were your numerous requests for a translation and refinement of individual sections. The original is still available for reference (German only): Welche Zeit für Data Vault Timelines?
Or should the DMCE team better use the timestamp of the extraction of the data from the particular source? Or the timestamp at which the data solution writes the records to the particular target data objects (e.g. tables)? Not an easy question to answer, is it?
To find a decision, the DMCE team looks at this question from the perspective of the business department. What does the department want to look at the data in terms of time?
Accuracy, when did I get my data
Achieve consistently, contradiction-free reports that are reproducible at any time.
This means it is necessary to know: When was the data in the data solution registered? More precisely, when was the data written to the data solution. It does not matter when the data was created.
Thus, from the perspective of the data solution, the question can be answered consistently: Which data from data object A fit/belong/correspond in time to which data in data object B to show a consistent picture of the business at FastChangeCo™?
Consistency, according to source systems
As accurate, precise information as possible about when data originated in the source.
This means that the data in the data solution should reflect as closely as possible the origin of the data in the original source system, but may not enable reports that are consistent over time.
Thus, from the perspective of the data solution, the question can be answered accurately: When did an event occur in reality? The data solution's data provides the most accurate picture possible of when the FastChangeCo™ transaction took place.
Accuracy or consistency decision
To make a decision, the DMCE team must answer the following questions for themselves:
- Is it absolutely necessary that reports are 100% reproducible? For example, so-called late arriving data can confuse the chronological order and thus subsequently change a report.
- Are all systems (source systems, databases, applications) synchronized with the same time (e.g. via a time server)? Especially applications do not always "lovingly" take care of the correct time when saving data. This can also lead to inconsistent time lines.
- Do I get information about the time zones in which data originated or only UTC?
- Are daylight saving and standard time taken into account in the operational systems?
- Is the transaction ( creation ) timestamp of the source not a transaction ( last modified ) timestamp in reality?
The DMCE team is aware that it cannot always trust timestamps that are not under its own control, or more precisely, under the control of the data solution!
Therefore, it is important to answer these questions first when implementing FastChangeCo™ data solutions with unitemporal data objects, i.e. with a single timeline. They are crucial for the choice of the time to be used in the Data Vault data model of the data solution.
By deciding on accuracy or consistency, the DMCE team has a significant influence on how data and information are presented in the query layer over time as the project progresses.
In addition, the DMCE team considers which point in time, e.g., Transaction Time (Source), Extraction Time, Load Cycle Date, Load Transaction Time, etc., is the most appropriate to store as an additional data element in the target data object or as additional descriptive information in an appropriate metadata model.
A well-designed metadata model is able to remove the boundaries between accuracy or consistency. By documenting the different times in the metadata model, a query layer can switch between accuracy and consistency.
Another variant is that accuracy and consistency are solved in the bitemporal context and both views of the data are possible. Thus accuracy still corresponds to the approach 'When did the data solution know something about the data' and consistency means in the for the data solution 'When was the data valid in reality!'
But that's another story. More about that in one of the next articles in the series. You should definitely stop by again.
So long,
Euer Dirk