With the heterogeneous Data Warehouse environment increasingly seen as a pragmatic and realistic approach to Big Data, WhereScape Senior Solutions Architect, Martin Norgrove explores the latest on why Big Data is an evolution and not a revolution, and explains why Data Warehouse Automation is key to its success.
Big Data solutions are growing beyond niche installations at Fortune 500 companies. As we approach a tipping point over the next 1-2 years, we will see more and more organisations implement Big Data technologies in order to make smarter use of their data for analytics and future prediction.
A recent announcement from Hortonworks around the Stinger.next initiative, which the Hadoop community will be working on over the next 18 months, further emphasizes the importance of SQL as the language of data and demonstrates the commitment of Big Data vendors to adopt best practice from the well-understood and established Data Warehouse community.
This fits with the Industry Analyst view that Hadoop and related technologies are complementary technologies for Data Warehousing, rather than replacement technologies. In fact, Big Data vendors promote their products similarly, being positioned by the likes of Hortonworks, Cloudera, and MapR, as an evolution and extension of the existing Business Intelligence and Data Warehousing Landscape. New terms, “Data Lake” or “Enterprise Data Hub” have entered the data lexicon. It sounds impressive, but what does it mean?
The problem with traditional RDBMS systems is that when faced with truly high volume / high velocity data they struggle to keep up, when combined with the flexible or schema-less nature of Big Data this starts to present quite a barrier to including these new datasets into traditional analysis. Although it’s clear Big Data is an evolution rather than a revolution, the new and exciting concept or the game changer is the flexible schema on read. It sounds so simple and yet it’s the key to the puzzle. We have had cluster hardware and software that can handle the data volume for years, but what we have struggled with is the ability to unify the multitude of formats, and process the data in an efficient manner. Enter Hadoop.
With Hadoop we have a framework within which to store the data and to process it. What we also have through Hive and related technologies is the ability to create datasets and reference these definitions to underlying files at analysis time rather than processing time. We can store data quickly and easily and we can access it via SQL at analysis time when we need it. The Data Lake is the unification of unstructured data - the consolidation point from which we can begin to enrich traditional analytical datasets and conduct previously difficult or new analysis with Big Data.
Of course we still have structured data that lives in the EDW, with a Data Lake alongside we can now supplement that structured data with analysis based on un-structured data: the result being highly valuable insights for businesses. This could be referred to as the “Next-Gen Data Warehouse”: a next generation logical data warehouse environment - one that combines the Data Lake and Big Data, with EDW and structured data. Happy days…or so it seems?
There is a rather large challenge looming for us, and that is the complexity that an additional data consolidation point brings in terms of managing consistency of business rules, conducting impact analysis for change, and the sheer volume of coding to support the integration required (remember our data sources rarely play nicely with each other). Something is missing, something very important, and given the number of vendors in the data marketplace, it’s unlikely any one vendor is going to solve this for the community at large.
Enter WhereScape RED
The missing piece is consolidated meta-data for the entire logical data warehouse environment the Data Lake + EDW, automated code generation and end-to-end control of the data processing steps. This is what we need to ensure we get consistent and trustworthy data when it is needed by the business. Enter WhereScape RED, which has a unique value proposition both now with traditional structured data, and into the future as we build support for Big Data.
Big Data brings a set of large technical challenges, not least of which are the scarce IT resources most organisations have at their disposal to meet the coding and design needs of a logical Data Warehouse. A tool like WhereScape RED which can unify the disparate meta-data of a logical data warehouse architecture, amplify scarce IT resources, support the building of trusted end-to-end solutions fast and on budget, as well as meeting or exceeding business expectations, could be the answer...
Why WhereScape RED?
WhereScape RED is a meta-data tool from the ground up. At WhereScape we discovered this before just about anyone else in the industry, and we’ve been perfecting it for over 15 years. WhereScape also gets the value of Data Warehouse Automation: the value of generating code that is both repeatable and standardized. Simplifying support and freeing developers to tackle the really big problems that Big Data has, and will undoubtedly bring.
In conclusion, WhereScape is looking out for you in the Big Data world, and one of the many aspects of data management that really excites us, is the increasing opportunity and value that exists for traditional and Big Data analytics both now and into the future.