Building a Data Warehouse

By Barry Devlin

| July 17, 2017

Over this series of four posts, I explore the keys to a successful data warehouse. Last time, I started with design—a reasonable place to begin! The topic of this post is build, with operation and maintenance to follow.

Even with a beautiful design model in your mind’s eye, the question of how to build a data warehouse raises its ugly head! Ugly because no matter how lovely the model, implementation is always hobbled by the less than perfect reality of the data source systems. In the words of an old Irish joke in reply to a request for directions: “if I wanted to go there, I wouldn’t start from here.” Since the earliest days, builders of data warehouses have struggled with missing data in source systems, poorly defined data structures, incorrect content, and missing relationships, to name but a few. Implementation, therefore, becomes a delicate balancing act between the vision of the model and the constraints of the sources. In simplistic terms, the process comes down to the following steps.

Building a Data Warehouse

1. Data Sources

Often described as data archeology, this step presents major challenges, especially for legacy systems, which—even if originally well documented—have usually been “bent to fit” emerging and urgent requirements. Modern big data sources may be equally challenging as a result of poor or absent documentation.

2. Compare Data

Compare the data available to the data warehouse model and define appropriate transformations to convert the former to the latter.

3. Data Warehouse Model

Where transformations are too difficult, modify the data warehouse model to accommodate the reality of the data sources. Changing the data sources—which would be the right answer when they are in error—is usually impossible for reasons of cost, politics, or both.

4. Test Performance

Test performance of load/update processes and check ability of modified model to deliver the data needed by the business.

5. Iterate Improvements

If successful, declare victory. Otherwise, rinse and repeat.

Data Warehouse Automation

Traditionally, the output of the above process would be encoded in a script or program and run—typically overnight in batch—to populate the warehouse. Any changes in requirements or, more problematically, in the source systems (beyond the control of the data warehouse developers) required a round trip back through steps 1 to 5, followed by code update. The approach is manual, time-consuming, and error-prone.

The solution over the years has been to automate the process in a series of approaches: ETL (extract, transform, load) tools, data integration systems, and latterly, data warehouse automation (DWA). In essence, each step on this journey depicts an increasing level of automation, with DWA designed to address the entire process of design, build, operation, and maintenance.

WhereScape RED

In the transition from design to build, the combination of a well-structured data model and a DWA tool such as WhereScape® RED offers a particularly powerful approach to automation. This is because the data model provides an integrated starting set of metadata that describes the target tables in both business terms and technical implementation. This is particularly true in case of the Data Vault model, which has been designed and optimized from the start for data warehousing.

Consider, for example, the business need to analyze orders by value and geographical source. To the business person, order seems a simple, straightforward concept. In modeling terms, of course, it consists of a rather complex combination of entities, including product and person/customer. The structure to be built is equally intricate in terms of tables and the relationships between to them. The Data Vault model provides a database template for that structure, mapping directly from the business entities to a best practice set of data elements—from tables and columns through to relationships to indexes.

WhereScape Data Vault Express

A DWA tool automates the transformation of the data structures of the various sources to the optimized model of the Data Vault and populates the target tables with the appropriate data, creating necessary indexes, and cleansing and combining source data to create the basis for the analysis needed by the business. WhereScape® Data Vault Express™ provides the underlying templates to automatically and quickly build all the required structures (tables, indexes, etc.) and processes (ETL code) without manual programming and optimized for the chosen implementation platform, such as Teradata, Oracle, Microsoft, etc.

But, it’s about more than automating programming. In the future, Data Vault Express plans to address further build-time elements, including the methodology and best delivery practices defined by the Data Vault community, to avoid design errors and support proper auditing and management of the warehouse environment. That leads us to part three of this series.

You can find the other blog posts in this series here:

Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.

Data Governance in Healthcare: HIPAA Compliance Guide

Jan 7, 2026

TL;DR Healthcare data architects must integrate fragmented clinical systems (EHRs, PACS, LIS) while maintaining HIPAA-compliant lineage and clinical data quality. Data Vault modeling can help provide the audit trails regulators demand, but generates hundreds of tables...

Future-Proofing the Data Vault for AI: Governance, Context, and Automation

Dec 19, 2025

From Data Foundations to AI Readiness As organizations race to operationalize AI, many are discovering a hard truth: AI outcomes are only as good as the data foundations beneath them. Without trusted history, clear context, and strong governance, even the most...

Enterprise Data Warehouse Guide: Architecture, Costs and Deployment

Dec 17, 2025

TL;DR: Enterprise data warehouses centralize business data for analysis, but most implementations run over budget and timeline while requiring specialized talent. They unify reporting across departments and enable self-service analytics, yet the technical complexity...

What Is a Data Vault? A Complete Guide for Data Leaders

Dec 12, 2025

A data vault is a data modeling methodology designed to handle rapidly changing source systems, complex data relationships, and strict audit requirements that traditional data warehouses struggle to manage. Unlike conventional approaches that require extensive...

New in 3D 9.0.6.1: The ‘Source Aware’ Release

Dec 4, 2025

When your sources shift beneath you, the fastest teams adapt at the metadata layer. WhereScape 3D 9.0.6.1 focuses on precisely that: making your modeling, conversion rules and catalog imports more aware of where data comes from and how it should be treated in-flight....

Data Vault on Snowflake: The What, Why & How?

Dec 3, 2025

Modern data teams need a warehouse design that embraces change. Data Vault, especially Data Vault 2.0, offers a way to integrate many sources rapidly while preserving history and auditability. Snowflake, with elastic compute and fully managed services, provides an...

Data Governance in Financial Services: Architecture Requirements for BCBS 239, Basel III, DORA and Regulatory Compliance

Nov 26, 2025

TL;DR: Data governance in financial services determines whether a firm can meet strict regulatory expectations for accuracy and data tracking across every stage of its reporting chain. Institutions that build governance into their architecture avoid the audit...

Data Vault 2.0: What Changed and Why It Matters for Data Teams

Nov 20, 2025

Data Vault 2.0 emerged from years of production implementations, codifying the patterns that consistently delivered results. Dan Linstedt released the original Data Vault specification in 2000. The hub-link-satellite modeling approach solved a real problem: how do you...

Building an AI Data Warehouse: Using Automation to Scale

Nov 12, 2025

The AI data warehouse is emerging as the definitive foundation of modern data infrastructure. This is all driven by the rise of artificial intelligence. More and more organizations are rushing to make use of what AI can do. In a survey run by Hostinger, around 78% of...

Data Vault Modeling: Building Scalable, Auditable Data Warehouses

Nov 5, 2025

Data Vault modeling enables teams to manage large, rapidly changing data without compromising structure or performance. It combines normalized storage with dimensional access, often by building star or snowflake marts on top, supporting accurate lineage and audit...

Monitor & Protect

Data Modeling & Management

Migration & Intelligence

Building a Data Warehouse

Building a Data Warehouse

1. Data Sources

2. Compare Data

3. Data Warehouse Model

4. Test Performance

5. Iterate Improvements

Data Warehouse Automation

WhereScape RED

WhereScape Data Vault Express

Data Governance in Healthcare: HIPAA Compliance Guide

Future-Proofing the Data Vault for AI: Governance, Context, and Automation

What Is a Data Vault? A Complete Guide for Data Leaders

New in 3D 9.0.6.1: The ‘Source Aware’ Release

Data Vault on Snowflake: The What, Why & How?

Data Governance in Financial Services: Architecture Requirements for BCBS 239, Basel III, DORA and Regulatory Compliance

Data Vault 2.0: What Changed and Why It Matters for Data Teams

Building an AI Data Warehouse: Using Automation to Scale

Data Vault Modeling: Building Scalable, Auditable Data Warehouses

Related Content