Explore the transformative power of data...
Data Warehouse Design
Over this series of four posts, I explore the keys to a successful data warehouse, starting with data warehouse design. The topics for future posts are: build, operation, and maintenance.
Designing a Data Warehouse
In the design of a data warehouse and, indeed, over the entire warehouse journey, the most important principle to keep in mind is that what you are delivering is a cross-functional, long-life foundation for data provision and decision support. Let’s parse that out:
- Cross-functional may, in some cases, extend to cover the entire enterprise; in other cases, it may address only the needs of the different departments in a business unit. In either case, the data warehouse must be able to satisfy the unique needs of multiple decision support projects from diverse groups of business users.
- Long-life means that these projects will arise and mutate over many years; they cannot be known in advance when you first design your initial data warehouse.
- As a foundation for data provision, the data warehouse must provide consistent, reconciled, legally-binding data to its various business clients, to ensure that decisions made are reliable, auditable, and non-contradictory.
These characteristics have long bedeviled the design (and subsequent phases) of the process of delivering a data warehouse. If you have delivered a data mart, for example, you will know that you must meet the specific and (reasonably) well-known data needs of a single department, usually as quickly as possible. If you delivered input to a data lake, you will understand that flexibility of use—often called schema-on-read—is of paramount importance. Data marts and lakes are projects; a data warehouse is a process consisting of multiple projects, which must work together toward the goals outlined above.
Enterprise Data Model
At the heart of this process is a vision of what the data in the warehouse should ultimately look like. This vision is best described by an enterprise data model (EDM), consisting of the high-level entities that a business must track: customers, products, orders, and so on. (In Bill Inmon’s terminology, these are called subject areas.) In the past, EDMs were often built from scratch, a process that provided data modelers with jobs for life, but frustrated the business users who were drawn into drawn-out definitional debates rather than seeing the results they needed.
Today, many EDMs are customized from standard industry models—a much faster and easier process. After all, how different are the characteristics of customers, for example, across various banks? In many more cases, an EDM already exists within the organization as a result of previous data warehouse or other enterprise-wide undertakings, such as master data management, for example.
Data Warehouse Challenges
The major design challenge for today’s data warehouses is defining and refining the logical (and ultimately physical) structure of the relational tables of the data warehouse. Ultimately, a good design must take into account the limitations of the source systems, the challenges in joining data from multiple sources, and the possibility of changes in both business needs and source system structures over time. These are topics for later posts focusing on build and maintenance concerns. For now, we focus on how to achieve the “ideal structure” to facilitate loading, updating and using data warehouse tables.
3NF
The traditional design approach recommends mapping high-level entities to “loosely normalized” tables—based initially on third normal form (3NF), but relaxed sufficiently to ease population and querying performance issues that arise from strict 3NF implementation. The goal of this approach is to create a cross-enterprise, functionally neutral set of data that can be used for a wide variety of query, analysis, and reporting needs.
3NF was originally designed to ensure insert/update/delete consistency of operational databases, as was thus suboptimal for data warehousing. Ralph Kimball proposed an alternative approach in the 1990s known as dimensional or star-schema models. While widely used—mainly because it promises faster delivery of decision support projects—its design is also suboptimal for a data warehouse, being highly optimized for slice-and-dice analysis, and driven by the specific business needs of a particular department. In essence, it is more suitable for a data mart than a data warehouse.
Data Vault
In the early 2000s, Dan Linstedt defined the Data Vault Model, a hybrid of the normalized and star schema forms above, which better balances the generality of loosely normalized and the delivery speed of dimensional models. The Data Vault is a detail-oriented, history-tracking, specially linked set of normalized tables designed to support multiple functional business areas. The model consists of three specialized types of entities/tables: hubs based on rarely changed business keys, links that describe associations or transactions between business keys, and satellites that hold all temporal and descriptive attributes of business keys and their associations. Version 2.0, introduced in 2013, consisting of a data model, methodology, and systems architecture, provides a design basis for data warehouses that emphasizes core data quality, consistency, and agility to support enterprise-wide data provision needs.
With design sorted, it’s time to move to the build phase. That is the topic of the next post.
You can find the other blog posts in this series here:
- Week 2: Building a Data Warehouse
- Week 3: Operating a Data Warehouse
- Week 4: Maintaining a Data Warehouse
Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.
Who is Dan Linstedt? Unlock the Secrets of Data Vault 2.0 in Our Exclusive Webinar
Introduction Dan Linstedt is a name that should be familiar to anyone interested in data warehousing and business analytics. As the pioneer behind the Data Vault 2.0 methodology, Linstedt isn’t just a leading expert in data architecture; he’s a visionary.. His methods...
WhereScape 3D 9.0.2.0 Product Release: Taking Data Modeling to New Heights
A Milestone in Data Modeling Today, as data drives innovation and strategic planning, the latest release of WhereScape 3D 9.0.2.0 isn’t just an update-it’s a data modeling milestone. This version transcends a typical update; it completely transforms the data modeling...
The Power Of WhereScape’s Data Analytics in Higher Education: A Webinar Recap
A Gathering of Data Analytical Minds In the rapidly evolving landscape of data analytics in higher education, institutions are grappling with an unprecedented influx of data. A report published by the EDUCAUSE Center for Applied Research reveals that 69 percent of...
Unlocking the Future of Higher Education Analytics: Why Data Automation Matters!
In today’s digital age, can you imagine manually analyzing vast datasets in the ever-evolving landscape of higher education? Institutions are shifting from traditional analytics to more advanced methods in pursuit of excellence and gaining a competitive edge...
Beyond Automation: The Transformative Partnership of WhereScape and Databricks
Are you seeking ways to automate data management and expedite project deployment? Dive into the transformative world of the WhereScape-Databricks integration! Introduction WhereScape is a beacon of automated excellence in the vast data management universe. As firms...
Amplifying WhereScape’s Power with Yellowfin: Unveiling New Analytics Opportunities for Your Business
In an age dominated by vast amounts of information, the emphasis on data-driven decision-making has never been greater. The landscape of Business Intelligence (BI) and data analytics has seen a remarkable evolution, emphasizing solutions that can seamlessly integrate...
Data Mesh and Data Fabric: Changing the Game in Data Product Development
Data Mesh vs Data Fabric Data Mesh and Data Fabric are reshaping how organizations approach data product development. In an era where data-driven decisions are central to business success, these innovative paradigms are becoming increasingly crucial. By enabling...
WhereScape Announces the Release of RED 10.0.0.0
WhereScape is pleased to announce the general availability of WhereScape RED 10.0.0.0. This release is the culmination of man-years of effort. It confirms WhereScape’s commitment to continuing to develop new technologies and tools and its commitment to delivering the...
Effective AI through Data Modeling
As we journey deeper into the digital age, the importance of data modeling within the broader landscape of artificial intelligence (AI) has become more pronounced than ever. The success of AI-driven initiatives is tightly woven with the quality and structure of the...
Is Data Vault 2.0 Still Relevant?
TL;DR Yes. Data Vault 2.0 Data Vault 2.0 is a database modeling method published in 2013. It was designed to overcome many of the shortcomings of data warehouses created using relational modeling (3NF) or star schemas (dimensional modeling). Speci fically, it...
Related Content
Who is Dan Linstedt? Unlock the Secrets of Data Vault 2.0 in Our Exclusive Webinar
Introduction Dan Linstedt is a name that should be familiar to anyone interested in data warehousing and business analytics. As the pioneer behind the Data Vault 2.0 methodology, Linstedt isn’t just a leading expert in data architecture; he’s a visionary.. His methods...
WhereScape 3D 9.0.2.0 Product Release: Taking Data Modeling to New Heights
A Milestone in Data Modeling Today, as data drives innovation and strategic planning, the latest release of WhereScape 3D 9.0.2.0 isn’t just an update-it’s a data modeling milestone. This version transcends a typical update; it completely transforms the data modeling...
The Power Of WhereScape’s Data Analytics in Higher Education: A Webinar Recap
A Gathering of Data Analytical Minds In the rapidly evolving landscape of data analytics in higher education, institutions are grappling with an unprecedented influx of data. A report published by the EDUCAUSE Center for Applied Research reveals that 69 percent of...
Unlocking the Future of Higher Education Analytics: Why Data Automation Matters!
In today’s digital age, can you imagine manually analyzing vast datasets in the ever-evolving landscape of higher education? Institutions are shifting from traditional analytics to more advanced methods in pursuit of excellence and gaining a competitive edge...