I was talking about agile development with one of our software developers, who also used to be a data warehouse consultant. He made the point that designing and building a data warehouse is harder than designing and building software. The culprit – data. The conclusion we came to was that rather than adopting the same practices we should be looking at the best way to adapt the Agile Manifesto to the more complex problem of implementing a data warehouse, otherwise we run the risk of falling into the same traps that the agile movement set out to avoid.
We are not talking about Agile BI. That is the easy part; it just makes total sense. Collaboration and responsiveness to change are prerequisites to success. We can’t imagine a better way of building a report, graph, dashboard etc than sitting down with users, with their data, and showing what is possible.
It is when the data isn’t there that the process becomes harder. That’s where we see the challenges for agile in the data world.
Agile in the data world needs to extend past the BI layer and into the data warehouse itself. We have all seen situations where it would have been considerably easier to make changes to the data layer rather than twist the BI tool to perform unnatural acts.
One answer to agile data warehousing is to gather all the data you need, organize it around the business (rather than source systems or requirements). Once you have this data layer, or enough of it, in place you can start to rapidly build out an agile data layer – we see these called data marts, dependent data marts, reporting tables, presentation layer objects, sandboxes and a myriad of other names. There is nothing wrong with this – in fact we endorse this strategy and actively promote it to organizations with existing data warehouses). If you have a robust, functional, well designed data warehouse in place, then encouraging usage is not a bad aim. Stephen Brobst, Michael McIntire and Edward Rado talk about the concept of sandboxing in Teradata environments in their 2008 article Agile Data Warehousing with Integrated Sandboxing. They go further than I would on governance advocating allowable residency of 90 days for sandbox tables, but in principle we are in agreement.
But what about when you don’t have an existing data warehouse? What if you cannot wait until you have that data layer completed?
We believe Agile is a great fit for the data world – as long as we embrace the differences that exist between the data and software development worlds rather than ignore them. The agile manifesto was and is aimed at developing software. We develop data warehouses.
One of the key differences – we start with data from day one, software developers do not.
WhereScape’s answer? Get the data into the equation quicker. In the information world, we can’t deliver working software without data. Our designs are theoretical until they are populated.
Yes, WhereScape RED combines data modelling and data integration with the data to remove silos and reduce handovers. Yes, it integrates meta data into the process for automated documentation and lineage to reduce development time and simplify change. But the practices are more important than the software.
For Agile to have the same impact it in the data world as it has in the software development world, (and I could argue it should have a greater impact!) we need to make sure we adapt rather than adopt the common practices – while ensuring we adhere to the underlying principles.