Data Vault Automation

Data Vault 2.0 and Data Warehousing

Venturing into Data Warehouse methodologies and automation may be an intimidating subject for some  but it is necessary in today’s fast-moving world. There are countless methodologies used in regards to  data warehousing and there is no “wrong” answer. This guide will look at two of the big players in the  data warehousing industry; Data Vault, and WhereScape.  

Data Vault and WhereScape are a matchmade in data warehouse automation heaven. Data Vault is a  methodology for data warehouses and WhereScape is there to assist Data Vault with its automation  capabilities. This guide will detail the benefits of using WhereScape with the Data Vault system. Also,  how it can help your company and team achieve their data goals.

What is Data Vault?

Data Vault is an agile system of business intelligence built to solve inadequacies in the Data Warehouse  models that preceded it. While 3NF and Dimensional modeling address the data, they fail to address the  people and the technology that impact data warehouses in 2020, especially at the enterprise level.  

Today a data warehouse is not a single entity. It is a living fabric of data spread across Cloud and on-premises  databases. It amalgamates sources from multiple geo-locations in a combination of batch, real-time and big  data formats. You shouldn’t be able to point to a single server and say “there lives my data warehouse”, at  least not at enterprise level. New data sources and the unpredictability of people and changing business  needs must be catered for with Agile architectures and methodologies. 

This is where Data Vault excels over legacy modeling techniques. “Collaboration” and “responding to  change” are two of the four main principles of the Agile Manifesto, and these in turn enable the other two:  working software and individuals over processes and tools.

Data Vault 2.0

A 25-year IT Veteran, Dan Linstedt has worked in data warehousing since 1990. Lindstedt has worked for numerous companies, doing everything from ETL to data warehousing build outs. He even solved some of the government integration problems and disparate data sets through the 90’s. At the start of the millennium, Dan was convinced that if the data vault were to  proceed successfully it must be done through a grassroots effort. Thus, began the creation of Data Vault 2.0. Data Vault 2.0 arrived in 2013 and brought to the table Big Data, NoSQL, unstructured, and  semi-structured seamless integration, along with best practices for methodology, architecture, and implementation.

Data Vault 2.0 vs. Data Warehouse

Data Vault 2.0 is a complete system of business intelligence that contains the necessary components  needed to accomplish the enterprise vision of Data Warehousing and Information Delivery. The Data Vault  2.0 approach simplifies data ingestion from multiple sources, makes it easy to add new data sources without  disruption, and increases scalability and consistency. The Data Vault is designed for solving enterprise-level  issues such as scalability, agility, flexibility, and consistency. The methodology standards help you drive teams  in the same direction, at the same speed to meet your company’s enterprise needs.

Data Vault vs. Dimensional Modeling  

In scenarios where you know all your requirements and data sources before the project begins and nothing  will change, Dimensional modeling and Star Schemas on SQL Servers work well. A smaller company whose  business requirements remain relatively static and who do not depend upon real-time data ingestion, big  data, and rapidly changing business rules, for example. But for international and enterprise companies dealing  with macroeconomics, mergers, acquisitions and rapidly changing markets, etc. Dimensional modeling is  too brittle.  

In large companies, if you build a data warehouse using this approach for one department and another  department wants one, and another, your costs spiral. If your team can’t get approval to pay for building data  warehouses for each department, they may use a different team to build one, and then you get data silos  which make it hard to achieve a single version of the truth. Instead, you should create data marts for each  department to avoid siloing. 

Additional problems include:

  • Uncoordinated and unintegrated data staging areas  
  • Challenge to change modeling styles  
  • Complex load patterns for conformed dimensions that make processes run slow  
  • Real-time data ingestion is not supported  
  • High cost of maintenance  
  • Inability to react to changes in business rules (while keeping historical data for audit)  

If a business has one or more of these issues, its data warehouse is unsustainable, it could decide to rip it up  and build a new one with their new requirements and sources – this is known as the greenfield approach. But  these inherent problems will just emerge again and again. With Data Vault, you only have projects that add  functionality. The older projects do not need to be fully tested after every change. Saving you time and costs  when building.

A Data Vault enables companies to:

  • Adapt to change without re-engineering  
  • Scale to petabyte levels  
  • Tackle Master Data Management  

A problem can’t be solved by using exactly the same plan that created it. What’s required is a holistic  framework that addresses technology and people as well as the data.

Wherescape Data Vault Express

Data is all around us and tracked constantly from infinite sources. Unfortunately, bandwidth and speed are  limited. Businesses want the data ASAP, but sometimes the ability to deliver is in the realm of weeks or  months or years. Even after you gather the necessary data, how do you know it’s valid? What’s the confidence  level? What if requirements change?  

The Data Vault 2.0 system combined with WhereScape technologies can help with the authentication and  automation of data analytic projects. This collaborative force is known as WhereScape Data Vault Express.  This data automation suite consists of two products, WhereScape 3D and WhereScape RED.  

3D is where all of the data modeling takes place. It provides complete project documentation, suitable  for review, distribution and governance submission and makes planning, modeling, and designing data  warehouses, vaults, lakes, and marts easy and effective.  

RED is the customized development environment for rapidly building, deploying, managing, and updating  your Data Warehouse. RED does this through a metadata-driven automation approach that generates code  from templates and automates the creation for your Data Vault. This approach cuts project cost and reduces  risk. With simplified and faster development, your data integration and infrastructure is at a lower risk and cost  than other DIY methods.

Data Warehouse Automation

To execute successfully in a Data Vault 2.0 environment you need skills, people, and processes, but you also need automation technology.  

WhereScape Data Vault Express dramatically cuts the time to develop Data Vault-based analysis solutions with built-in automation, wizards, patterns, models and templates. This can lead to delivery of Data Vault based solutions in hours and days as opposed to months and years.  

Data Vault Express reduces the risk of failure by getting your project to production faster with higher quality  and data consistency. WhereScape does this through automating the design, development, deployment and operation of enterprise data vaults. This includes building hubs, satellites, and links, in addition to automatically managing metadata attributes such as load date and record source. All while generating  uniform and optimized code native to your target platform.  

As well as creating the Data Vault tables, Data Vault Express also creates the supporting objects, which often grow to a very large number as you increase the functionality of the Data Vault. This is one of the many reasons this modeling approach cannot be effectively managed with manual processes. For Data Vault, automation tool is not a nice to have - as it was often perceived to be with dimensional modeling - it provides  essential support both during implementation and day-to-day/ongoing management.

The evidence is tangible. Dan Linstedttalks about projects in which he had to ingest data from 125 source  systems in six months with a team of three people and at WhereScape one customer, Micron, reported the  processing of 3.2 trillion records from ten different geo-locations every 24 hours.  

This is the reality of data warehousing today and projects like this just aren’t possible without automation.  

WhereScape automates the repetitive manual processes that once stifled Data Vault projects. There are  phases of Data Vault development that must be automated as much as possible such as ELT from disparate  sources to the database, and the design and creation of Data Vault structures. About 98% of this can be  automated, and all developers need to do is kick the process off.  

That 2% is all about metadata. Here, data acquisition teams profile the data and connect it to the business  keys and business processes. WhereScape will document the metadata around it. We need to understand  how the data set flows and understand at a conceptual level how the business deals with the data sets.  This human element is taught in Data Vault boot camp sessions. Data Vault is not just a modeling style but a  holistic system that comes with training in how to work in an augmented, Agile mindset.  

WhereScape uses templates that follow industry best practices, based on the successfu completion of hundreds of Data Vault projects. Developers will want to use these templates where possible for maximum speed, but they can always customize model conversion rules, and templates before generating the physical model and code. Once these patterns are in place, you can generate thousands of times.

However when it comes to modeling the data into information marts, there is no substitute for human  intelligence and creativity, so with developers taking a more prominent role here as things get interesting,  automation does around 60% of the work. Data teams can set up the design then use automation to  accelerate team processes. You can define and WhereScape will support building information marts with all of  the types of dimensions you need. It’s a templated approach rather than spending precious time starting from scratch every time you need a new factoid dimension. If you can put the definition of the business rules and  the right components into the metadata we can automate 60% of the downstream objects. 

As well as completing development work in a fraction of the time it would take by hand, without human error, WhereScape ensures all actions are standardized and adheres to industry best practice: the methodology, the approach, the design, the naming conventions and so on. There is no room for individual flair here, as if mistakes are made at this point the Data Vault will not scale effectively. When it comes to laying foundations, uniformity and repeatability are key.

Data Vault Automation with WhereScape

WhereScape is a multi-developer environment. It brings version control and enables teams around the world  to leverage the same platform and follow the same methodology, the same sprint cycles and achieve the  same outcomes in terms of cost, maintenance with almost zero errors put into production. If you don’t have  the right standards and rigor, your Data Vault, or any kind of data warehouse for that matter, will fail. We  need to get everybody on the same platform and the same level of understanding, and then we can start to  move faster.  

The major advantage of a Data Automation tool like WhereScape sitting on top of your Data Vault, building  and orchestrating all facets, is that every action you or the tool takes is fully documented in HTML. This means  every action is recorded in a standardized, exportable language which reduces your reliance on individuals  and means new employees can see how structures were produced. This also frees up development time, and  makes sure the documentation is always produced right away, at the click of a button, rather than being done  belatedly or, as can often be the case, not at all.