This BeyeNETWORK article features Wayne Eckerson’s interview with Mark Budzinski. Wayne is founder and director of consulting at Eckerson Group, and Mark is the President of WhereScape

Let’s start by having you explain data warehouse automation.

Mark Budzinski: Data warehouse automation is a really well-kept secret. Think of it as a contemporary approach to building and managing data warehouses of all shapes and sizes. This includes business-facing data marts and data vaults, independent of architecture. The traditional methods of building out that data infrastructure, where we model the data, utilize the ETL tools, and get source to target maps – all of that waterfall methodology – has proven to be problematic when it comes to responsiveness to the business. The business now is anxious for data in a form they want in a time frame that they need. Automation enables organizations to dramatically speed up the ability to create that infrastructure.

Automation does things natively in the data warehouse target platform. We create data definition language (DDL) to create tables. It creates data manipulation language (DML) in the form of stored procedures in the native database to populate the tables. And, perhaps most interestingly, automation creates the documentation across the board – something that is rarely done well and sometimes not done at all. Automation technology makes sure that the entire environment is well documented.

This sounds very powerful. Automation drives agility, so why is it such a well-kept secret?

Mark Budzinski: I think it is such a well-kept secret because the vendors – WhereScape being the one that pioneered this industry ten years ago – are antagonists to the status quo. If you consider the major companies that have multi-million dollar investments in these traditional tools and, more importantly, the people that are conducting the ongoing development and maintenance of the environment, there is a tremendous force to retain the status quo. Once the management in IT that is really interested in meeting the needs of the business users, experiences the benefits of automation, they find it to be a superb solution, and off we go.   

Data warehouse automation started as a novel, out-of-the-box idea, but now it is becoming mainstream, used by companies including Blue Cross Blue Shield affiliates in three different states, as well as Tesco and Costco – two of the three largest retailers in the world.

So, old habits die hard, and a lot of folks have yet to discover the value of data warehouse automation, but obviously your customers have. What are your customers generally using data warehousing automation for? Are there any primary use cases?

Mark Budzinski: There are quite a few. The obvious one is if you have a new data warehouse to build. Automation is going to help you get it done in less time with fewer resources and less hassle. And it will be fully documented when you are done – it is the perfect case. But this is 2016, not 1990, so these new environments are few and far between, although they are out there. The Delta Community Credit Union – the sixth largest credit union in the country – represented a green screen opportunity. They were building a new Teradata data warehouse from scratch, and they greatly accelerated the process with data warehouse automation.

But most of our major customers – such as the Blue Cross Blue Shield affiliates – already had an enterprise data warehouse being fed by Informatica or DataStage. The issue for them was does that data warehouse feed the specific needs of an analyst or business user in marketing and sales, operations or other areas? There is oftentimes a semantic layer – a business-facing data mart, if you will – put in place to solve immediate and pressing business needs. Such is the case of the Blue Cross Blue Shield use cases. Building an infrastructure specific for use cases where the data is in the data warehouse gets us only so far. We have a lot of other sources now we are accessing and integrating. We are seeing data coming from sensors, from data lakes, from the CFO’s spreadsheet. So when IT is asked, "Can you build us the specific data mart that we need for our use,” the typical response is, “Yeah, we’ll try to get back to you in six months.” It will probably take a year and a million dollars using this old-school approach. Data warehouse automation to build that environment is also ideal.

A third use case would be migration – moving, for example, from SQL Server to Teradata. You’re not going to take the SQL Server infrastructure and just somehow magically push a button so it works in Teradata. In fact, even if there was such a button, you’d be foolish to push it because you would not be taking advantage of the Teradata investment that you made. So automation enables you to essentially reverse-engineer what was built – to understand the data models and the basic essence of the SQL Server environment in this example – and rebuild it in Teradata in a way that the Teradata infrastructure and performance are maximized. As a result, everyone is happy and we’ve greatly minimized the time.

The last example I will give is what we call big data extensions. That rolls into an announcement that we made recently whereby we are fully embracing, as most of our customers are, what Gartner calls the logical data warehouse and Teradata calls the Unified Data Architecture (UDA). The world of data management is not homogeneous any more. A company is not a Teradata shop in the way that it used to be, so they have to contemplate Aster for contemporary analytic purposes as well as Hadoop and all the surrounding Hadoop technologies. 

There is a lot of activity in this space. But the thing that continues to be elusive for most customers is metadata management. How do they keep track of this data when it comes from a sensor and moves into a Hadoop environment? There is some processing that is done perhaps in Hive. That processed data then comes over into the mainstream Teradata environment where it can be operationalized. We have to take data that is otherwise analyzed by data scientists – largely in Hadoop – and get that into the greater reporting and analytic environment that is served by Teradata, in this example.

Companies want to extend their existing environments to include – not replace – big data. WhereScape and data warehouse automation is a very nice play there.

There are lots of different use cases. I like the one about driving more value from your warehouse because for so many organizations it is just a repository that is kind of sitting by itself. And if we can drive data marts off of that more rapidly and augment those marts with other data that is not in the warehouse, that’s what companies need today.

You mentioned an announcement that you recently made. Can you tell us about that announcement?

Mark Budzinski: The announcement was about extending big data for data lineage and documentation end-to-end. So you can track where the data started, how it was processed and how it ended up. This is very important, particularly when something breaks. How do you fix the environment?

Is this a point release of your existing product or is it an add-on?

Mark Budzinski: It is a point release. It is the next wave. But the other thing I should mention is sensor data, as a good example. It comes in very rapidly. It has to be processed in Hadoop. Our first instance in this announcement is we automate the Hive environment just like we automated the traditional Teradata environment.   

You’re pulling Hive data into Teradata and tracking that metadata lineage?

Mark Budzinski: Yes, and we’re generating the actual Hive code just like we generate the Teradata SQL. We’re generating Hive SQL. We’re automating the process of pre-processing your data as it enters the Hadoop environment. 

Hive is actually a target, it’s not a source?

Mark Budzinski: Correct. 

That sounds very powerful. Thank you very much for this wonderful discussion.