Operating a Data Warehouse

By Barry Devlin

| July 24, 2017

Having designed and built your data warehouse, I imagine that you’d like to deliver it successfully to the business and run it smoothly on a daily basis. That’s the topic of today’s article.

As digitalization continues apace across all industries, the role and value of a data warehouse—together with its attendant data marts and associated data lake—becomes ever more central to business success. With such informational systems now becoming as important as traditional operational systems, and often more so, it should be self-evident that highly reliable and efficient operating practices must be adopted.

Historically, however, approaches to operating a data warehouse environment have been somewhat lax. Manual and semi-automated methods for populating and updating the contents of the warehouse have been widespread. Advances made in the data warehouse itself have been offset by the spread of ad hoc approaches in departmentally managed data marts. The data lake has seen the re-emergence of a cottage industry of hand-crafted scripts, often operated by individual data scientists.

The challenges of data lake operations and management have recently attracted widespread comment (centered on the phrase data swamp) and increasing focus by vendors. Nonetheless, it is the data warehouse—as the repository of truth about the legally-binding history of the business and the basis for reliable exploration and analysis of business challenges and opportunities—where most can be gained by the adoption of advanced management and automation practices. The combination of data warehouse automation (DWA) and Data Vault address these needs from two perspectives: deployment of function and ongoing day-to-day operations.

Deployment seldom gets the attention it deserves; it’s not exactly the sexiest part of any IT project! However, for a data warehouse, deployment needs to be treated as a long-term, monogamous relationship. A data warehouse is substantially more complex than most IT projects, given the variety and number of systems involved. It is also significantly more important to get right as we move toward data-driven business and more agile development approaches.

As a data warehouse moves from the development phase (design and build discussed previously) to test, quality assurance, and on to production, seemingly mundane—yet highly important—issues such as packaging and installation of the code built in the previous phase must be addressed. In the case of DWA, where all cleansing, transformation, (and loading, of course) occur in the target databases of the data warehouse, mart and lake, this code consists of both definitional SQL (DDL) that builds the database structures such as tables, indexes, etc. and the processing code (DML) that creates the data to populate them.

In the context of a warehouse designed with the Data Vault model and methods, WhereScape® Data Vault Express™ allows a set of objects, such as the related Hub, Satellite, and Link objects of a customer ensemble, to be bundled together, transported and installed with ease from the development environment to quality assurance and then subsequently into production. This bundle includes the selected object and related metadata—structure (DDL), processing code (ELT procedures, etc.), and jobs. When installed into an environment, the product determines what DDL changes need to be made if an object already exists (for example, add columns, new indexes, and so on) and constructs the appropriate DDL syntax statements. New ELT code such as stored-procedures are then installed and compiled in the database.

The clear aim—subject, of course, to internal policies—is to automate the deployment activities in order to speed deployment in agile development approaches and to reduce the chance of human error across the full life cycle.

Having deployed the system to production, the next—and ongoing—task is to schedule, execute, and monitor the continuing process of loading and transforming data into the data warehouse. In this phase, jobs defined by WhereScape consist of a sequence of interdependent tasks. For example, one sequence could be to drop certain indexes on a table, load new data to the table, and rebuild the indexes. The administrator creating the job can specify the tasks and their interdependencies to ensure the objects are processed in the correct order. During execution, the tasks run in parallel (up to a specified threshold, to ensure the system is not overloaded), subject to dependency constraints. This feature is particularly useful in a Data Vault 2.0 environment, where the design supports elevated levels of parallelization of load tasks.

To ensure that data consistency is maintained, if a task fails during execution, then all downstream dependent tasks are halted. When the problem has been resolved, the job is restarted and will pick up from where it left off and continue through to completion.

As mentioned earlier, a modern analytical environment consists of a combination of data warehouse, marts, and data lake. From an operational point of view, given potential interdependencies of data across these systems, it makes sense to manage this ensemble as a single, logical environment. WhereScape supports this aim, both in terms of its job definitions that span multiple systems and in its provision of a centralized monitoring and logging repository. Here, all job execution activities such as processing times (start and finish), rows loaded, errors, exceptions, and so on are logged. This provides historical information that can be used to track load performance over time, allowing administrators to make any necessary adjustments as data volumes grow.

The smooth, ongoing daily operation of the entire data warehouse environment is a fundamental prerequisite to its acceptance by users and its overall value to the business. Nothing will destroy users’ confidence more quickly than arriving every morning and not knowing whether the warehouse is up and running with the latest data updates.

And yet, there’s more! How do you support the changes in requirements that occur almost continuously when a data warehouse has been successful? Data Vault and Data Warehouse Automation can help here too. That is the topic of the fourth and final post of this series.

You can find the other blog posts in this series here:

Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.

The Modern Data Lifecycle: How-to Build a Data Environment Ready for AI

Apr 24, 2026

Let’s preface this blog with what many know deep down but not everyone has consciously accepted: a modern data environment is no longer just a place to store, transform and report on data. Instead, it is now expected to support business intelligence, real-time...

Data Lineage: Why Modern Data Teams Need It More Than Ever

Apr 17, 2026

Ask almost any data team where a number came from, and you will usually get one of two answers. Either someone knows immediately, or everyone starts digging through SQL, pipeline logic, wikis, and old messages to reconstruct the story after the fact. That gap is...

SQL Server Integration Services, Without the Slow Build Cycles

Apr 10, 2026

For so many SQL Server teams, SQL Server Integration Services (SSIS) still sits at the very heart of data movement, transformation and scheduled load processes. Microsoft’s own documentation still defines SSIS as a platform for enterprise-grade data integration and...

Modernizing SQL Server: Without Breaking What Already Works

Apr 2, 2026

For a lot of organizations, SQL Server performance is not just a technical concern; it’s a business continuity concern. When reporting runs long, overnight loads miss their windows or the team becomes afraid to touch a fragile stored procedure because nobody even...

Event Debrief: FABCON // SQLCON Atlanta 2026 – Trends, Talking Points & More

Mar 27, 2026

When we got back from FABCON // SQLCON Atlanta 2026, one thing was made immediately clear: the market is not short on interest. But it is short on certainty. This year’s combined event brought Microsoft Fabric and Microsoft SQL audiences together under one roof, with...

Data Model Diagram Guide: Why Visual Modeling Beats Command-Line Workflows

Mar 20, 2026

A data model diagram is easy to dismiss until a project gets too large, a source system changes or the one person who “understands how it all fits together” goes on vacation. That is the real problem that visual modeling solves. Modern data teams have more code, more...

Creating a Data Warehouse After a Failed BI Project: What to Fix First?

Mar 13, 2026

If you are creating a data warehouse after a failed BI or analytics initiative, the instinct is often to assume the strategy itself was wrong. Usually, it was not. Most failed data warehouse projects do not collapse because the business case was weak. They fail...

On-Premise to Cloud Migration: A Practical Framework for Data Warehouse Modernization

Feb 26, 2026

Cloud migration projects fail when teams treat them like data center relocations. The schema you optimized for SQL Server won't perform the same way in Snowflake's columnar architecture. Batch ETL windows that made sense on dedicated hardware waste money during...

Building and Automating SQL Server Data Warehouses: A Practical Guide

Feb 20, 2026

Key takeaways: SQL Server warehouses aren't legacy; they're production environments that need faster build processes Manual builds scale poorly: 200 tables can equal 400+ SSIS packages, inconsistent SCD logic across developers Metadata-driven automation can cut...

SQL Server Data Warehouse Architecture: Choosing the Right Foundation for Long-Term Performance

Feb 6, 2026

Key Takeaways Architecture decisions in week one can determine costs for years. Wrong pattern = 6-12 months of rework. Star schemas work for most reporting workloads. Data Vault is for when you need full audit trails or volatile sources. Three-tier separation isolates...

Monitor & Protect

Data Modeling & Management

Migration & Intelligence