Tune in for a live WhereScape 3D + DVE virtual...
Maintaining a Data Warehouse
In the final article of this series, I address the topic of maintaining the data warehouse that has been designed, built, and deployed over the previous three articles. But, maintenance is too small a word for what is entailed!
In more traditional IT projects, when a successful system is tested, deployed and in daily operation, its developers can usually sit back and take a well-deserved rest as users come on-board, and leave ongoing maintenance to a small team of bug-fixers and providers of minor enhancements. At least until the start of the next major release cycle. Developers of today’s data warehouses have no such luxury.
The measure of success of a data warehouse is only partly defined by the number and satisfaction level of active users. The nature of creative decision-making support is that users are continuously discovering new business requirements, changing their mind about what data they need, and thus demanding new data elements and structures on a weekly or monthly basis. Indeed, in some cases, the demands may arrive daily!
This need for agility in regularly delivering new and updated data to the business through the data warehouse has long been recognized by vendors and practitioners in the space. Unfortunately, such agility has proven difficult to achieve in the past. Now, ongoing digitalization of business is driving ever higher demands for new and fresh data. Current—and, in my view, short-sighted—market thinking is that a data lake filled with every conceivable sort of raw, loosely managed data will address these needs. That approach may work for non-critical, externally sourced social media and Internet of Things data. However, it really doesn’t help with the legally-binding, historical and (increasingly) real-time internally and externally sourced data currently delivered via the data warehouse.
Fortunately, the agile and automated characteristics of the Data Vault / data warehouse automation (DWA) approach described in the design, build and operate phases discussed in earlier posts apply also to the maintenance phase. In fact, it may be argued that these characteristics are even more important in the maintenance phase than in the earlier ones of data warehouse development.
One explicit design point of the Data Vault data model is agility. A key differentiator between Hub, Link, and Satellite tables is that they have very different usage types and temporal characteristics. Such separation of concerns allows changes in both data requirements (frequent and driven by business needs) and data sources (less frequent, but often requiring deep “data archeology”) to be handled separately and more easily than in traditional designs. In effect, the data warehouse is structured according to good engineering principles, while the data marts flow with user needs. This structuring enables continuous iteration of agile updates to the warehouse, continuing through to the marts, by reducing or eliminating rework of existing tables when addressing new needs. For a high-level explanation of how this works, see Sanjay Pande’s excellent “Agile Data Warehousing Using the Data Vault Architecture” article.
The engineered components and methodology of the Data Vault approach are particularly well-suited to the application of DWA tools, as we saw in the design and build phases. However, it is in the maintain phase that the advantages of DWA become even more apparent. Widespread automation is essential for agility in the maintenance phase, because it increases developer productivity, reduces cycle times, and eliminates many types of coding errors. WhereScape® Data Vault Express™ incorporates key elements of the Data Vault approach within the structures, templates, and methodology it provides to improve a team’s capabilities to make the most of potential automation gains.
Furthermore, WhereScape’s metadata-driven approach means that all the design and development work done in preceding iterations of data warehouse/mart development is always immediately available to the developers of a subsequent iteration. This is provided through the extensive metadata that WhereScape stores in the relational database repository and makes available directly to developers of new tables and/or population procedures. This metadata plays an active role in the development and runtime processes of the data warehouse (and marts) and is thus guaranteed to be far more consistent and up-to-date than typical separate and manually maintained metadata stores such as spreadsheets or text documents.
In addition, WhereScape automatically generates documentation, which is automatically maintained, and related diagrams, including impact analysis, track back/forward, and so on. These artefacts aid in understanding and reducing the risk of future changes to the warehouse, by allowing developers to discover and avoid possible downstream impacts of any changes being considered.
Another key factor in ensuring agility and success in the maintenance phase is the ongoing and committed involvement of business people. WhereScape’s automated, templated approach to the entire design, build and deployment process allows business users to be involved continuously and intimately during every stage of development and maintenance of the warehouse and marts.
With maintenance, we come to the end of our journey through the land of automating warehouses, marts, lakes, and vaults of data. At each step of the way, combining the use of the Data Vault approach with data warehouse automation tools simplifies technical procedures and eases the business path to data-driven decision making. WhereScape Data Vault Express represents a further major stride toward the goal of fully agile data delivery and use throughout the business.
You can find the other blog posts in this series here:
- Week 1: Designing a Data Warehouse
- Week 2: Building a Data Warehouse
- Week 3: Operating a Data Warehouse
Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.
Guide to Data Quality: Ensuring Accuracy and Consistency in Your Organization
Why Data Quality Matters Data is only as useful as it is accurate and complete. No matter how many analysis models and data review routines you put into place, your organization can’t truly make data-driven decisions without accurate, relevant, complete, and...
Common Data Quality Challenges and How to Overcome Them
The Importance of Maintaining Data Quality Improving data quality is a top priority for many forward-thinking organizations, and for good reason. Any company making decisions based on data should also invest time and resources into ensuring high data quality. Data...
What is a Cloud Data Warehouse?
As organizations increasingly turn to data-driven decision-making, the demand for cloud data warehouses continues to rise. The cloud data warehouse market is projected to grow significantly, reaching $10.42 billion by 2026 with a compound annual growth rate (CAGR) of...
Developers’ Best Friend: WhereScape Saves Countless Hours
Development teams often struggle with an imbalance between building new features and maintaining existing code. According to studies, up to 75% of a developer's time is spent debugging and fixing code, much of it due to manual processes. This results in 620 million...
Mastering Data Vault Modeling: Architecture, Best Practices, and Essential Tools
What is Data Vault Modeling? To effectively manage large-scale and complex data environments, many data teams turn to Data Vault modeling. This technique provides a highly scalable and flexible architecture that can easily adapt to the growing and changing needs of an...
Scaling Data Warehouses in Education: Strategies for Managing Growing Data Demand
Approximately 74% of educational leaders report that data-driven decision-making enhances institutional performance and helps achieve academic goals. [1] Pinpointing effective data management strategies in education can make a profound impact on learning...
Future-Proofing Manufacturing IT with WhereScape: Driving Efficiency and Innovation
Manufacturing IT strives to conserve resources and add efficiency through the strategic use of data and technology solutions. Toward that end, manufacturing IT teams can drive efficiency and innovation by selecting top tools for data-driven manufacturing and...
The Competitive Advantages of WhereScape
After nearly a quarter-century in the data automation field, WhereScape has established itself as a leader by offering unparalleled capabilities that surpass its competitors. Today we’ll dive into the advantages of WhereScape and highlight why it is the premier data...
Data Management In Healthcare: Streamlining Operations for Improved Care
Appropriate and efficient data management in healthcare plays a large role in staff bandwidth, patient experience, and health outcomes. Healthcare teams require access to patient records and treatment history in order to properly perform their jobs. Operationally,...
WhereScape 3D 9.0.4 Now Available: Integrate with Microsoft Purview
We are excited to announce the release of WhereScape 3D Version 9.0.4, which is packed with new enhancements, highlighted by the integration with Microsoft Purview. Additional features include advanced data profiling for custom connections, Pebble extensions for...
Related Content
Guide to Data Quality: Ensuring Accuracy and Consistency in Your Organization
Why Data Quality Matters Data is only as useful as it is accurate and complete. No matter how many analysis models and data review routines you put into place, your organization can’t truly make data-driven decisions without accurate, relevant, complete, and...
Common Data Quality Challenges and How to Overcome Them
The Importance of Maintaining Data Quality Improving data quality is a top priority for many forward-thinking organizations, and for good reason. Any company making decisions based on data should also invest time and resources into ensuring high data quality. Data...
What is a Cloud Data Warehouse?
A cloud data warehouse is an advanced database service managed and hosted over the internet.
Developers’ Best Friend: WhereScape Saves Countless Hours
Development teams often struggle with an imbalance between building new features and maintaining existing code. According to studies, up to 75% of a developer's time is spent debugging and fixing code, much of it due to manual processes. This results in 620 million...