Big data is now entering such an acceleration phase. Two companies, Qubole and WhereScape, illustrate how creating, working with, and managing big data has become much easier. These offerings go beyond the first generation of instant big data infrastructure, such as Amazon’s Elastic MapReduce, in that they not only create the raw engine but also massively simplify configuration and help manage the flow of work as well.
Qubole performs this feat for Hadoop and Presto while WhereScape works its magic for SQL databases like Teradata, Netezza, and SQL Server. It is interesting to see how these offerings accelerate time to value when analyzing big data. What is only starting to become clear is how this sort of higher level automation will change the way we work with data.
WhereScape’s Taming of the Data Warehouse
One of the interesting developments in the past couple of years is the way that almost every leader in big data has publicly declared the need in one way or another for a data warehouse. For example, Facebook announced in December that it purchased Vertica for internal use after a hotly contested bake off. Netflix has spoken in many venues about the role its Teradata data warehouse plays in its infrastructure for data management and analysis. The role these data warehouses play at both companies is probably different in important ways than the way that data warehouses are used in most businesses, but the fact that a data warehouse is crucial is telling.
Both Facebook and Netflix seem to need something that a data warehouse offers that they cannot get anywhere else. In my view, this something is the core value provided by SQL. In most data warehouses, you have hundreds of tables of information. SQL allows you to model the connections between these tables and get the data you want. MicroStrategy and all the big BI vendors have report writers that write SQL automatically. So, in essence, what the data warehouse provides is a way for hundreds or thousands of people to interact with high value data in exactly the way they want. Data warehouses are not going away, and the engineering required to optimize, process, and balance the load of hundreds of simultaneous queries will be not easy to migrate to other platforms. Once data is distilled into a form that has what Teradata calls “high business value density”, a data warehouse is a powerful way to put it to work for a large group of people.
But the challenge is that data warehouses are fiendishly complex. Putting a data warehouse together is like building a Boeing 777 from parts; operating one is akin to flying a 777. WhereScape helps with both the construction and operation of data warehouses.
Data warehousing technology must be complex to handle all the different demands placed on it in the business world. What we do is put a simplifying model on top to make it far easier and cheaper to build these powerful systems.
WhereScape offers a higher level model that can be used by mere mortals to build, configure, manage, and operate a data warehouse. WhereScape works with Teradata, Microsoft, IBM, Oracle, and others. It is primarily used to configure and manage on-premises data warehouses, but it can also be used with cloud-based systems.
Given the complexity of constructing and operating large-scale data warehouses, many organizations are reticent to make changes once they are in production. Using WhereScape, those who run data warehouses lose their fear of change and adaptation. Setting up new data warehouses becomes far faster and easier because WhereScape automates the time consuming processes involved in building them.