We can all agree that we don’t want to revert to the era where ETL developers (these days going by the fancy name of Data Engineers) started sweating profusely when asked to add a new source to the EDW and eventual dashboard. The fear of breaking everything by trying to integrate something new was palpable. On the other side of the pendulum is the process of just dumping data into a data swamp, I mean lake, and asking end users to query the data directly using in-memory technologies. This has also not ended well. Both these approaches to data warehousing fortified the silos of tech vs business … us vs them.
Data Vault 2.0 is that great mid-way between these 2 extremes. There are many benefits of using Data Vault 2.0, but let’s focus on the adaptability, especially when it comes to new sources and new technologies.
New Sources
The Data Vault 2.0 architecture enhances de-coupling and ensures low-impact changes. The way in which data pivots around Business keys and the relationship between Business keys makes it easy to add new sources.
Due to the standardised insert patterns being insert-only (and thus non-destructive in nature) the process of adding additional sources does not risk the structural integrity of the existing objects (i.e not giving your engineers heart attacks). However, you still benefit from the passive integration around the business keys from the start. This allows you to store data in a lake-like manner without leading to an unintegrated data dump.
The standardised query patterns on consumption out of the Data Vault means that any additional data that gets added to the Raw Vault can be accessed via the same patterns. This means that all the queries can be auto-generated to include the additional data once the metadata is available. Again this can be done in a purely additive manner, by either only adding columns, or generating new queries only containing the desired datasets.
As the full history is stored in the Data Vault (Raw and Business) one does not need to regenerate history in your Mart layer, you only need to incorporate any additional data that might change the grain of data. Sure, you would still need some elbow grease to do business logic but repeatable patterns exist in all layers making it possible to still leverage automation.
New technologies
Change is the only constant. Organisations will change, leadership will change and technologies will change. Whether your team has decided to move technology platforms for a good reason or a not so good reason, doesn’t really matter. It will happen. We can accept this reality. Organisations cannot linger forever on a decision, because technologies enter the market faster than most corporate decision making cycles. So the ideal would be if you can prepare for this inevitable outcome.
A proven way to do this is by abstracting your Data Warehouse data model and logic from the actual target technology. Data Vault helps with this. How, you ask? Because it is built based on repeatable patterns it sets the basis to be able to easily adapt to new targets leveraging automation.
By having multiple repeatable patterns for every layer of the Data Vault Architecture you can and must leverage automation. By using a template driven approach to generate the target specific code you are not entrenching yourself into the nuances of the specific target. Anyone who has ever built something meaningful on any platform will know that no platform is perfect and if you hand code you will eventually have written code around platform shortcomings. Using automation forces your hand to first fix the problem conceptually.
Migration from one target to another will still require careful thought and planning but having the lever to generate target specific code will vastly reduce the turnaround time.
Once your organisation is in a position where the metadata is abstracted from the target, then you can adapt much quicker to the changing technology landscape. So if your boss decides to change the target platform after careful consideration or to simply make a statement, it doesn’t really matter, you will have built the foundation for this inevitable scenario.
By being adaptable, DV2.0 helps break down the walls between tech and business and makes possible the collaboration that is really needed to provide valuable insights from our data.
Corné Potgieter is a data enthusiast and somewhat of a purist however knowing that practicality remains king.
His roles have spanned across different areas of data analytics, data modeling and BI engineering, including designing solutions using different patterns of data modeling, including Kimball and Data Vault.
Corné is a Solutions Architect at WhereScape.