From Data Warehouse Automation to Data Architecture Automation

By Rick van der Lans, Founder of R20/Consultancy BV

| June 25, 2021

From Data Warehouse Automation to Data Architecture Automation

For a long time, the data warehouse architecture was the sole ruler of data delivery to decision-making processes, but not anymore. It now has to share the stage with other data architectures, such as the data lake, data hub, and data lakehouse. Because the data in these new data architectures is structured, organized and used differently, a new breed of generators is required: the data architecture automation tools.

The benefits of using generators are clear. They accelerate development, ease maintenance, create run-time platform independence, improve performance, and so on.

Not every task is suitable for automation and for which suitable generators can be developed. The best tasks suitable for automation are repetitive by nature and can be expressed as formal algorithms that indicate which steps to perform, what to do in special cases and how to react if something goes wrong. In other words, these tasks can be formalized.

Many of the tasks involved in designing, developing and maintaining data warehouse architectures are repetitive and can be formalized, making them highly suited for automation. For example, when an enterprise data warehouse uses a data vault design technique and the physical data marts use star schemas, both can be generated from a central data model including the ETL code to copy the data from the warehouse to the data marts.

Data architecture automation tools can be referred to as the third generation of generators used in automating the development of data architectures to support decision-making processes. The first generation is formed by tools such as ETL, BI and data modeling tools. For example, ETL tools transform high-level specifications to lower-level code to do the actual ETL work, many BI tools can be considered to be generators because they generate SQL statements that extract data from databases, and some data science tools enable data scientists to work at a high conceptual level from which code is generated.

All these generators help to accelerate development and ease maintenance, but they are all limited to generate just one component of an entire data architecture. Therefore, multiple independent generators are required to generate the complete architecture. Since these generators require similar specifications, they are defined multiple times, or in other words, they are duplicated. It is a challenge to keep all these scattered specifications consistent, to ensure that they work together optimally, and to guarantee that if one specification is changed, all the duplicate specifications are changed accordingly.

The principles that apply to generators of individual platform components can be applied to generators of entire data architectures. That is why they were succeeded by the second generation of generators, the data warehouse automation tools that generate entire data warehouse architectures. They do not generate code for one component of the architecture, but for several. Traditional data warehouse automation tools generate, for example, staging areas, enterprise data warehouses, physical data marts, the ETL solutions that copy data from one database to another, and metadata. Several of these tools have been on the market for years and have proven their worth. They all store all the metadata specifications once and reuse them when generating, for example, the data warehouse tables, the data mart tables, and the ETL logic to copy the data.

The main restriction of several data warehouse automation tools is that they only generate traditional data warehouse architectures which can only support a restricted set of data consumption forms.

Today, organizations also want to deploy data hubs, data lakes, and data lakehouses. These are used to support new forms of data consumption. For example, in these new data architectures, data is copied to a data hub and from there to a data warehouse, or the data architecture consist of a data lake that stores data from a data warehouse, transactional databases and external data sources.

Supporting other data architectures requires generators that can be adapted to generate data architectures composed of other types of data stores than those supported by more traditional data warehouse architectures. The term data warehouse automation is probably a misnomer for these tools, it is too restrictive. Data architecture automation tool is more suitable. With the increasing need by organizations to become more data-driven, or in in other words, to use data more widely, effectively and efficiently, the need for generators that can generate any kind of data architecture to support any form of data consumption, the need for adaptable data architecture automation tools has increased accordingly.

AI Readiness for K-12 Data Teams: Seven Lessons From Monterey Peninsula Unified School District

Jul 23, 2026

Our take: AI readiness for K-12 data teams needs to start with trusted data. It doesn’t start with selecting an LLM, it doesn’t start with deploying a chatbot and it certainly doesn’t start by giving an AI agent unrestricted access to every table in a student...

Data Modeling for AI Readiness: A Practical Guide – From Source Discovery to Deployment

Jul 17, 2026

Data modeling is where AI readiness becomes concrete. AI systems need trusted context, not simply more data. They need clear definitions, understood relationships, known quality constraints and traceable transformations. Without those foundations, an AI agent may...

How-to: Migrate a Data Warehouse to the Cloud – A 10-Step Guide

Jul 10, 2026

To migrate data warehouse workloads successfully, start with discovery and dependency mapping. Then design the target, move in waves, validate parity and finally optimize continuously. Sounds simple on the surface, right? But the difficulty lies in everything...

Higher Education Data Challenges: How to Build Trusted Data Foundations for Analytics, AI and Modernization

Jul 3, 2026

What we’ve observed typically goes like this: higher education data challenges are not usually caused by a lack of data. In fact, most colleges and universities have plenty of data: student records, enrollment data, financial aid information, learning management...

New in 3D 9.0.6.4: The ‘Workflow Control’ Release

Jun 25, 2026

Data modeling workflows need to be predictable. Whether teams are importing models through the command line, running workflow scripts, applying Model Conversion Rules or editing multiple entity columns at once, they need confidence that every step can be monitored,...

Enterprise Data Modeling: Turning Architecture Into the Metadata Control Plane for AI-Ready Data

Jun 19, 2026

Enterprise data modeling is no longer just a design exercise. For years, data models helped architects define entities, relationships, keys, attributes and structures before implementation. That work still matters. Conceptual, logical and physical models remain...

Replacing SAP PowerDesigner: A Practical Data Modeling Migration Path

Jun 9, 2026

For many enterprise data teams, SAP PowerDesigner has been part of the data architecture toolkit for years. It has supported conceptual data models, logical data models, physical data models, warehouse modeling, reverse engineering, impact analysis and database design...

Choosing a Modern Data Modeling Platform: Design Warehouses, Lakes, and Lakehouses with Confidence

Jun 8, 2026

Modern data estates have outgrown the whiteboard. The diagrams that once captured a single warehouse now have to describe dozens of sources, multiple cloud platforms and a web of regulatory obligations that change faster than most teams can document them. When a...

Why Data Warehouse Projects Fail After They Go Live

May 29, 2026

Building a data warehouse is hard, sure. But making sure it stays useful is even harder. Many data warehouse projects are judged on the launch … did the team connect the right sources, build the models, create the dashboards and deliver the first round of reporting?...

How-to: Design Data Architectures That Adapt as You Evolve

May 22, 2026

Data architectures rarely fail because they were wrong on day one. More often, they fail later, when the business changes faster than the architecture can keep up. New source systems arrive. Definitions change. Mergers happen. Reporting requirements expand. Platforms...

Monitor & Protect

Data Modeling & Management

Migration & Intelligence