We have been asked about WhereScape 3D and big data. While we are not yet aware of any customer use cases, it does come up in conversations where people want to explore the boundaries of the software.
The engineers have been doing some investigation on various big data scenarios. With WhereScape 3D we will only ever be interested in data that will eventually, in some form, be used in decision making and will most often be used in conjunction with a traditional data warehouse. This gives us some boundaries for our big data investigation and ultimately functionality that we will support.
One of the key concepts behind WhereScape Data Driven Design (3D) is that we want to include data in the design and planning process. Given this, an obvious place to start our big data journey was with Hive. And yes we are very much aware that big data does not just mean Hadoop. It is where we started, not where we intend to finish.
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Since there is a JDBC driver for Hive we can use standard WhereScape 3D functionality – in theory anyway.
Here is a WhereScape 3D screen shot showing discovery and profiling against Hive:
The good news is that we could get it going. The not so good news is that the JDBC driver was temperamental and slow.
The bottom line…this is not yet ready for prime time. We are continuing on the WhereScape 3D big data journey. We are looking for input on how people want to analyze big data (not just Hive) – please feel free to contact us if you have a scenario you would like us to investigate.