We at WhereScape were delighted to read Gartner’s technical paper, Assessing the Capabilities of Data Warehouse Automation (DWA), Published 8 February 2021
By Analyst Gartner analyst Ramke Ramakrishnan, which follows just a year on from Henry Cook’s research guide, Automating Data Warehouse Development. In our opinion, this attention reflects the feeling in the data industry that automation is now necessary to build and manage a data ecosystem that can keep pace with the complexity of modern business.
The content was refreshingly familiar to me, as someone who has written about the benefits of Data Automation software for the last few years. Ramke really understands the benefits we regularly hear about on customer testimonials and proofs of concept and is clearly singing from the same page. In this 39-page document, he goes into great detail about not just what DWA can achieve, but how it can be used to suit the variety of use cases we hear about when talking to prospective customers.
Of the various topics covered, there were five that particularly resonated:
1. An automated DW lifecycle based on templating and metadata
The paper shows how DWA tools orchestrate the data warehousing process end-to-end, rather than being one of many tools that solve niche problems as in the traditional data warehousing lifecycle. This means companies don’t need teams of specialists at each stage of the process with manual handoffs between them, which can often lead to miscommunication and makes it harder to get a holistic view of the project.
An automation tool like WhereScape represents a ready-made, templated production line that a company can slot its own data sources into and model the data to suit its needs. Using industry best practices and the experience of many past projects, the software ensures data structures are built quickly by automating all repetitive tasks whilst keeping IT teams in full control.
According to Gartner analyst Ramke Ramakrishnan: “The template-driven approach for data warehouse development reduces operational and compliance risks and is a disciplined process for delivering quality data warehouses incorporating all the best practices. As new best patterns emerge through your implementation experience, they can easily be encapsulated into these templates. The code is regenerated to integrate them effectively, which is harder to implement with traditional and manual methods. The availability and use of templates in the DWA ensure consistency and a standardized process for defining and deploying the data warehouse, thereby enabling continuous improvement.”
WhereScape describes the entire data ecosystem from source to target, including every action taken and object used, in the metadata. This enables full documentation to be produced (at the click of a button at any time), version control, rapid change management and the ability to switch target database much faster if required.
According to Gartner analyst Ramke Ramakrishnan: “The metadata-driven approach provides a set of control information about the data warehouse, source systems, and load processes to help drive complete automation. The classification of metadata in the data warehouse can be either back-room metadata or front-room metadata. Back-room metadata is predominantly process-oriented, focusing on the ELT processes. In contrast, front-room metadata is more descriptive and works on querying, reporting, administering and orchestration of data warehouse processes.
“Metadata represents the repository that contains all the artifacts for automating the data warehouse platform. The tools capture the requirements and design definition from the data models defined through the user interface and store them in the repository as metadata. The metadata is the backbone for automatically generating the database schemas, table structures, transformation routines, and workflows for data warehouse operations.”
2. How DWA enables companies to handle growing complexity
As each new innovation enriches the data landscape, offering new possibilities for quicker access to the right data at the right time, it increases complexity for those tasked with integrating the data ecosystem. Technology has moved on immeasurably since the 1990s, but many data teams still rely on 90s ETL tools and hand-coding to create and control a modern data fabric.
If companies are to build and scale complex data architectures using the latest technologies such as hybrid cloud models, multi-cloud, Data Vault 2.0 and so on, the choices are to hire huge teams of expensive, specialized resources indefinitely, or use a smaller team to operation automation software, which does the grunt work and orchestration.
According to Gartner analyst Ramke Ramakrishnan: “Because it requires handling real-time data; low-latency performance; complex integration of structured, semistructured and unstructured data; ingestion of streaming data; and connecting to sensor data and IoT, the implementation can be tricky on an exponential scale. Automating these elements’ design plays a critical and essential role in data warehouse modernization and agile data warehousing.”
3. The role DWA plays in DevOps, DataOps and other Agile methodologies
With automation handling the complexity, data teams can focus on delivering infrastructure and completing projects to Agile timeframes. The considerable increase in response times that teams who switch to DWA enjoy can enable them to perform holistic Agile frameworks such as DevOps or DataOps. Such initiatives can be truly transformative, and completely change the way data is available and used by a whole organization.
According to Gartner analyst Ramke Ramakrishnan: “Driving automation through agility: DWA has departed from the traditional way of building a data warehouse by adapting its tools to leverage more agile methods to implementing the value and capabilities incrementally and over multiple iterations instead of taking a big bang approach.
“DataOps aims to deliver value faster by creating predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate the design, deployment and management of data delivery with the appropriate governance and metadata levels to improve data use and value in a dynamic environment. The goal of DataOps in the data warehouse systems is to bring rigor, agility, reuse and automation to the development of data pipelines and analytical applications.”
4. Prototyping and how DWA supports various design approaches
The speed at which WhereScape can build data structures, and even populate them with live data if required, means prototypes can be rapidly produced. This supports Agile principles by allowing increased collaboration between IT and the business, with shorter iterations.
WhereScape enables prototyping in both model-driven and data-driven design. WhereScape 3D, the modeling tool, profiles all data sources. This allows architects to choose the most suitable structure and prototype it to check all business requirements have been understood and catered for then make changes if the design is not yet suitable for purpose. This model-driven design is particularly useful for the increasing number of companies opting for data vault modeling.
Data-driven design, meanwhile, enables developers to build prototypes with actual company data to show stakeholders how their requirements will behave in the final data warehouse. This enables business users to see whether any requirements were submitted incorrectly or any were forgotten altogether.
According to Gartner analyst Ramke Ramakrishnan: “The data-driven approach focuses on organizing the data models to align them closer to source systems. Business users and developers can collectively look at the data to gather inputs and feedback before creating the model. Using an iterative approach, data warehouse developers can rapidly build several prototypes before implementing the solution that meets the business user’s requirements. The method provides flexibility for deployment as well as management of changes to the data with flexible updates.”
Ramke’s paper talks at length about when and why to use various modeling styles, adding detail about how DWA can help each individual style, and can also facilitate a blend of styles if required. Different departments within large organizations can require different design styles, modeling styles and perhaps different target database depending on their needs or even geography. For example, WhereScape customer Legal & General uses WhereScape to enable teams to use SQL Server, DB2, Exasol or Snowflake.
5. Automation and data vault modeling
DWA also enables flexibility in modeling style. While dimensional modeling is still the structure of choice for many companies with quite fixed business requirements, Ramke also discusses the advantages of data vault modeling and how DWA augments it. Data Vault has grown in popularity in the last few years, in part due to the ability to integrate new data sources after the initial architecture is up and running. The acceleration in new technology and the speed of change in business requirements means the infrastructure that is right for the business now might not be suitable in three years’ time, never mind ten.
However, data vault design and maintenance is complex. We see companies that embark on Data vault projects without DWA start off okay, but hit problems in scaling the ecosystem with a huge amount of manual fire-fighting needed just to keep it working as it should. The integration of new data sources has up and downstream implications and so is prone to human error, as one oversight can have knock-on effects that are difficult and time-consuming to untangle with hand coding. Using automation, a complex data vault can work as it should first time, while implementation using best-practice templates flattens the learning curve and shortens time to value.
According to Gartner analyst, Ramke Ramakrishnan: “Data vault design requires a greater level of discipline because more physical tables are involved in the data vault design. DWA tools are a great way to manage these models more effectively.”
Gartner members can read the full 39-page document on Gartner.com here: Assessing the Capabilities of Data Warehouse Automation (DWA), Published 8 February 2021 by Analyst Gartner analyst Ramke Ramakrishnan.