This week, Dr. Barry Devlin published a provocative new paper on data warehouse automation – “BI Built to Order, On-Demand: Automating Data Warehouse Delivery.”
You can grab it here, if you’re curious. And you should be. Because in the paper Devlin does two things: first, he considers a few Inconvenient Truths about how data warehouses are built and managed – or misbuilt and mismanaged – and, second, he makes the case for data warehouse automation as a common-sense fix for today’s often mismanaged data warehouse development.
When Devlin described his vision for the business information system he called a “data warehouse” – back in early 1988 – we just didn’t have the tools to efficiently design, build, and manage warehouse systems. Everything, or almost everything, had to be done by hand: there weren’t any ETL tools, data integration suites, studios, platforms or workbenches. But even once we got primitive versions of these tools – starting in 1993 or thereabouts – things didn’t magically get better. In fact, by 2003, we were already starting to come to grips with the empirical fact that data warehouse projects took too long to build, failed to deliver on many of the promises Devlin had outlined in his paper, and, most important, were too hard to change. We know: WhereScape-the-company grew out of the integration experiences of our founders, who specialized in fixing just these problems.
But a great point that Devlin makes is that most of these problems were byproducts of what might be called an “out of phase” development process. Simply put: building data warehouse systems was and to some degree still is a disintegrated affair. In larger organizations, it is performed by separate teams or groups of developers, each working with their own set of tools, each using their own methodology, and each building at their own pace. According to Devlin, this is one of the biggest impediments to traditional analytic development.
“Modeling, database design and development of population routines required multiple, disconnected iterations involving business users, modelers, database administrators and ETL programmers at different times, each using different and unconnected tools. These gaps and tool transitions slowed the process and gave rise to design errors and inconsistencies,” Devlin writes.
The upshot is that this model compromises both the consistency of data and the timeliness of application delivery. Devlin sees data warehouse automation software – which centralizes data warehouse and analytical development in a single tool – promoting an iterative, agile development methodology, and implements a shared metadata repository – as a prescriptive Rx for this problem.
“The common environment and shared metadata repository offered by data warehouse automation overcomes this … by integrating the design and delivery of the data model, database structure, and the population process in one place – whether for a warehouse or mart,” he writes. “All the design and population metadata is stored together in a single repository, allowing development to flow smoothly and iteratively from user requirements, through database design, to creation of population routines. By integrating all the steps of the design and development process, consistent and quality data can be delivered quickly to the business for immediate review and early acceptance.”
Data warehouse automation software isn’t a turnkey fix. Devlin recognizes this. All the same, it’s a way to eliminate out-of-phase development, centralize the development process, and enforce a consistent, delivery-focused development paradigm. It gives you a solid foundation on which to build your data warehouse. Data warehouse automation software has other benefits that aren’t at all confined strictly to development. As Devlin notes, it promotes collaboration between business and IT, making it possible to produce data-driven – or business-data-driven – apps.
I’ll say more about this in a follow up post.