Menu Request Demo

Devlin on Data Warehouse Automation

20 January 2015
Mark Budzinski

More than 25 years ago, Barry Devlin helped write the foundational paper on data warehousing: “An architecture for a business and information system,” which Devlin co-authored and published in the IBM Systems Journal in 1988.

I think of Devlin's paper as a kind of seminal cave painting of sorts: it's a conceptual vision for what the data warehouse ought to be. A quarter of a century later, Devlin has gone and done it all over again.

I'm talking about the provocative new white paper Devlin published today: BI Built to Order, On-Demand: Automating Data Warehouse Delivery.

First thing's first: we’re biased. After all, WhereScape develop tools that help automate data warehouse design, development and delivery. Whenever anyone has anything intelligent to say about automation, we're listening. When that someone is a thinker of Barry's caliber, we're downright spellbound. What's more, we know we're on the right track.

Devlin zeroes in on a few key Inconvenient Truths about the way we (mis)build data warehouses. What's more, he offers several common sense prescriptive measures for fixing what's wrong. First, there's his claim that data warehouse automation “addresses the old conundrum of delivering consistent, quality data in the timeframe demanded by modern business needs.” This aligns very nicely with our own thoughts on the subject. Devlin conceived of the data warehouse as a system of record that could deliver consistent, quality information on a timely basis. But because of the labor-intensive, kludgey way we go about building data warehouse systems, we tend to give short shrift to the most important components of Barry's vision: consistency (of data) and timeliness (of delivery). Automation as Devlin sees it isn't just a generic prescription for accelerating warehouse development and delivery, it's a lifecycle model for emphasizing the timely, consistent production of analytical applications. When you practice data warehouse automation, you focus on regularly producing – on or ahead of schedule – iterative, measurable units of work.

In the data warehouse automation lifecycle, you're producing deliverables from Day One.

Second, Devlin talks about another advantage of data warehouse automation software: namely, that it provides a consistent interface and environment for analytic development, along with a single repository for metadata. (Think about what happens when you build stuff using separate design tools – each with their own quirks, methodologies, and metadata repositories.) I'm pretty sure Devlin once used the term “out of phase” to describe this development model, and it seems to me that another, similar term – namely, impedance mismatching – could work here, too. In any case, by centralizing data warehouse and analytical development in a single tool and promoting an iterative, agile development methodology, data warehouse automation software can eliminate the mismatching (e.g., separate teams of developers, each working with different tools, using different methodologies, and at their own respective paces) that comprises one of the biggest impediments in traditional analytic development.

Which brings us to a third critical advantage of data warehouse automation software. According to Devlin, it helps promote a collaborative experience between business and IT stakeholders. Data warehouse automation emphasizes a business-driven, iterative development process. Business stakeholders identify, refine, and define what they want – in small, measurable units of work – and collaborate with IT to design, build, and test it. In this way, automation targets the most pernicious case of impedance mis-matching – that between business and IT.

I'm going to write more about all three of these issues in a few follow-up posts. They're worth unpacking and discussing individually, chiefly because Devlin makes so many great points.

Stay tuned.

One final thought: One thing I want to stress is that the kind of automation that we at WhereScape champion doesn't aim to diminish the roles of developers, architects, or of other imaginative, innovative contributors. Instead, automation aims to make them much more productive. Data warehouse automation software targets the tedious, rote, time-consuming stuff – things like documentation, upstream impact analysis, metadata management, QA testing, and so on – which frees up human actors to do the creative stuff. In other words, automation allows you to do more with less. Inasmuch as you're already doing (much) more with less, data warehouse automation software gives you a lot more breathing room.

Comments 0

Leave a Comment

No comments.