If data is the lifeblood of the business world – a constant stream of information that fuels business decisions – then metadata is the DNA. It is ‘data that describes data’, documenting the source of your data, transformations it has been through, dependencies and so on. When paired with automation, metadata provides the agility to integrate new technologies and tackle disruption with confidence. This article explains why.
Metadata is a byproduct of a data warehouse automation tool that documents every single change it makes and stores all changes on a single document in a universal format. Without automation, documentation is often not written at all. Or it is done manually long after the work has been done. This leads to human error and a lack of uniformity. So, often, vital information can be recorded incorrectly or lost forever.
Without metadata, the closest data warehousing teams have to DNA is information held by staff as contextualised knowledge that is not easily transferrable between people or systems. We cannot truly ‘own’ this information in such a disorganized format. However, automated metadata is people-proof. If your data warehouse and all the developers that built it were to disappear, accurate metadata would enable it to be built again exactly the way it was.
Digital transformation requires change throughout IT, and in the data department this most commonly translates as a need for agile data infrastructure. Your data warehouse must be a single source of the facts, accessible to business users, but it must also be future-proofed against new technology and changes from within your organization.
Data infrastructure modernization efforts are about more than looking at your organization’s here-and-now requirements. By developing and implementing a metadata strategy that is fueled by automation, you can ensure your team’s effort and investment today will deliver the agility and flexibility required far into your future. The technology landscape is being disrupted at a ferocious pace, and this advance is only accelerating, so it’s important not to be locked in to any data source, modelling style or target data platform.
Data warehouse automation permeates from people to technology and back again, changing mindsets and methodologies. Teams of developers using tools that write thousands of lines of code in seconds will complete projects in a fraction of the time of those who still code by hand. So, they will also have more time for projects that onboard new technology and achieve business value from new projects.
As organizations increasingly choose to move at least some infrastructure into the cloud, retaining ownership and control of metadata is a safeguard to preventing solution lock-in and ensuring organizational flexibility for the future. Metadata merely describes the data architecture and is not dependent on the data platform the underlies it. This means you can simply lift and shift your data from one system to another as your business’ needs evolve. Data warehouse automation software can use your metadata to generate all the necessary code and documentation for your data on a new cloud platform, eliminating the need for time-intensive and redundant hand-coding.
In addition, your data will retain its full documentation, which is invaluable for creating a clear and auditable data trail as data protection legislation increases. For example, when GDPR hit in Europe last year, WhereScape customers had a pre-existing full audit trail to prove where their data came from, meaning they could choose which data to keep and or delete to comply. If sections of their infrastructure were not yet connected to WhereScape, they could connect it and then retrospectively scope and audit, even back to before they become a WhereScape customer.
How Metadata Works
WhereScape automatically produces metadata while it designs, develops, deploys and operates data infrastructure. The software can read from and write to a set of standard database metadata tables, and will keep vital records including documentation, diagrams and lineage information updated in real-time as your data warehousing team works. Today, WhereScape supports metadata-driven automation across a variety of popular data platforms including Snowflake, Amazon Redshift, Microsoft SQL Server, Microsoft Azure, Oracle, Teradata and more.
WhereScape’s metadata tables keep track of the upstream and downstream dependencies of all objects in the entire data infrastructure. This means developers can create, manage and document dependent objects safe in the knowledge the automation will ensure they remain integrated and appropriately altered should there be any changes to the underlying infrastructure that affects them. This allows data warehousing teams to fully leverage new technologies such as Snowflake without having to worry about the quality of their code or how it is affected by change elsewhere in their infrastructure.
Real World Benefits
So, what convenience does this technology give us and what does this mean in real world terms? At WhereScape, we are working with an insurance company that needed 10-15 external consultants for up to three months to perform scheduled updates. Now with automated code production, these updates take one or two days. Meanwhile, WhereScape has fully audited and documented their entire data ecosystem. If this company wanted to switch to a cloud provider, it would take a couple of weeks as opposed to perhaps a year of work and a massive cost.
Automation affects how we use and think about tech. It can significantly transform and evolve the mindset of development teams who may previously have been held back by the outdated patterns and values of the 1980s ETL era. The DNA of metadata can further drive this shift in mindset – configurable yet factual, and providing a snapshot that not only describes where we are now but insures against change and enables an agile future.