I recently had the pleasure of talking at Claudia Imhoff’s Boulder BI Brain Trust (BBBT) in Boulder, Colorado. This is an analyst event like no other. First you get time. Lots of it. Most briefings are 30 minutes, some as much as an hour, with the BBBT you get 3-3.5 hours. But with that time comes responsibility. You can’t just front load a product demonstration with some canned value proposition, time line and customer logo slides.
We took the opportunity to talk about Data Warehouse Automation. The BBBT has members from all over the world. And while we try and catch up with BBBT members as often as we can (they represent the top of the heap in the independent analyst world) you don’t often get the opportunity to talk to them all together, in a forum where ideas can be openly shared and debated in real time. We wanted to make sure the BBBT members were up to date with Data Warehouse Automation, make sure they understood what we did that was special, and get their feedback on what we should do differently.
We started with a slide we often start with – data warehouses are a great for some problems, but they take too long to build and are too hard to change.
This is a slide we have used a lot, but these days we feel we need to elaborate on. Before we even get into the real issue of the design and build, the term “data warehouse” tends to present a conundrum – it has now become a loaded phrase. People sometimes think that when we say “data warehouses” it means we are big data sceptics, or even deniers.
To elaborate our stance on this, we think the concept of a data warehouse has certainly changed. When I first started in the industry it was simple. I worked for Sequent Computer Systems (since bought by IBM), and a data warehouse consisted of hardware running a relational database.
A foundational element of a data warehouse was the relational database. Customers had an Oracle data warehouse, or a Teradata data warehouse (or Red Brick, even Informix, but at that time a SQL Server data warehouse was a novelty).
While the idea behind a data warehouse has not changed, the physical manifestation certainly has. Our customers now routinely have multiple databases/data stores that together make up their “data warehouse”. It can be as simple as SQL Server and cubes in a Microsoft world to Teradata, Asterdata, Hadoop, Netezza, Greenplum etc. We call this new world the data landscape, Claudia Imhoff and Colin White call it the extended data warehouse. Whatever way you want to implement these features, we think it is a “data warehouse”.
Having navigated that tricky semantic, we can continue onto the actual idea behind our first slide. The idea that data warehouses still take too long to build and are too hard to change, despite all the innovation we have seen over the years.
The first question is does it matter? Even if data warehouses take too long to build - who cares?
WhereScape has more than 600 customers now and the vast majority have selected our software because they see ‘time to value’ and ‘time to change’ as problems they want to solve. If time to value is not an issue, then an organization is not a prospective customer!
So if it does matter, what are the responses we see in the market? A common one is what we call “thrash”. This often happens when the status quo is clearly not working. A new CEO/CMO/CIO/ BI Manager comes in, highlights the issue, and declares that the data warehouse at their last job was much better. The current organization should change from Oracle to Teradata, or Informatica to Data Stage, or Cognos to Tableau. Or Teradata to Oracle, or Data Stage to Informatica, or Tableau to Cognos. A lot of money is spent, a vendor is happy, a case study written, and not a huge amount really changes.
We also see the rise of alternatives. These come in various forms – the prebuilt data warehouse from the likes of SAP (BW, Rapid Marts, HANA something-soon-I-am-sure), data virtualization, end user empowerment tools from the likes of Tableau or Qlik, and of course No SQL alternatives.
This was a ‘tread lightly’ part of the discussion, and a point that is easier to make with the extended time made available to us. We do not dismiss any of these products or approaches as somehow misguided or lacking in value. Take a product like Tableau. I will be the first to say it is great software, does what it says it will do and is a fantastic addition to the BI practitioners kitbag. But it is not an alternative to creating a data warehouse. Building a series of reports, may well provide instant graitification and seemingly save time and effort. But if you need to extract data from sources, reformat it, combine it, apply standard buisness rules to it and make it available to a range of users through a range of channels,then build a data warehouse - that is what they are good for!
So if alternatives to data warehouses are not the answer, how do we deal with the problem when data warehousing is just too hard? The most common response is to outsource. Make it someone else’s problem. If we are not getting value fast enough let’s put more people onto it. We can’t afford (or attract) the extra people locally so lets go to India/Eastern Europe/South America/South Asia to get a team. We see variable outomes from this reponse. Not surprisingly, organizations that are outsourcing for the right reasons are not encountering as many issues as companies that are just outsourcing problems.
The other common answer we see, is to utilize point solutions to solve the problem. As an example organizations will purchase a data model to speed up data modelling. If data modelling is the only problem then this is a great solution. If data modelling is not the only problem and is only the first problem in a full hurdle race of problems, then they won’t have the impact we hoped for. Going to the first solution for the first problem, is a great way to take on the first hurdle. But to finish the race, one needs to look ahead and see all the hurdles hurtling their way.
The other answer is automation.
The concept of automation is common across industries. When WhereScape talks about data warehouse automation we are talking about using automation to augment people rather than replace them. Data warehousing is hard. Businesses are complex. Gathering (future) requirements is difficult. We still need smart people, we just want to enable them to do more, faster.
Different sets of people have different views of data warehouse automation. For the business owner, the need for automation can be expressed as “I have too many projects, and not enough are being worked on” or “projects take too long” or “as soon as I get something delivered I think of eveything I need changed”.
From the developer viewpoint, data warehouse automation opportunities abound. Take building a slowly changing dimension (and thanks to Steve Hitchman for this example). The steps are basically:
• Identify attributes
• Identify business key
• Index business key and add a unique constraint
• Create surrogate key with auto sequence generation
• Index surrogate key
• Insert zero surrogate key row with values set for each attribute
• Add a modified timestamp column
• Write the SQL code to Insert new business keys or Update existing business key rows. Maintain the modified timestamp
• Create any other indexes required for querying
• Decide best practice for index maintenance during load. Keep in situ or drop and recreate after load.
• Document procedure
• Etc Etc
What do you actually need a smart developer to know?
• Identify attributes
• Identify business key
The rest we can automate. Let’s not expend valuable resource on the rest.
From the WhereScape perspective, the output from an automated solution is the same as the output from a manually created solution. You just get there faster. This is not a view that is necessarily shared by all data warehouse automation vendors, which is not a problem. There are a number of vendors who are focused on the same problem, who have identified data warehouse automation as the solution, and it is certainly healthy for the market that we come up with different approaches to add value.
Where it gets very interesting is the financial view of data warehouse automation. This is simple. On one side we have the traditional view – how long and how many people. This needs to be compared to the cost of the data warehouse automation and the reduction in time. It can be more sophisticated, and we can add in time savings for subsequent phases and for easier change. But often we don’t need to – for real world projects data warehouse automation can often be justified on the first project alone. This is the one of the most compelling aspects of data warehouse automation – organizations get the same solution, fully documented, full impact analysis AND save time and money. Beautiful.
Welcome to data warehouse automation.