Many organizations are faced with the challenge of how to adapt to the changing role of data. Data is increasingly considered a key asset that can improve and optimize business processes and decision-making processes. This changing role leads to new requirements for IT systems. Additionally, innovative new data processing and storage technologies, such as fast analytical SQL database servers, object storage technology, and parallel messaging technology, are creating new business opportunities. Nowadays, systems can be developed that were almost unimaginable or too expensive to develop years ago.
To support all these new requirements and to leverage these new technologies, organizations are rethinking and redeveloping the data architectures of their current data platforms. It is not an option to develop new data platforms with architectures that will be as outdated within a few years as current systems are today. They need data platforms with sustainable data architectures.
Data platforms with sustainable data architectures enable organizations to cope with the changing role of data and to leverage new and future technologies. A sustainable data architecture is not one that only supports current data requirements and all newly identified requirements, but one that is adaptable and extensible enough to support (as yet unknown) future requirements. When data consumption requirements change, the data architecture must be easy to adapt without the need for major reengineering and redevelopment exercises. Sustainable data architectures are as nimble as animals that have survived millions of years by adapting to their changing environment.
The key requirements for sustainable data architectures are:
Developing data platforms with sustainable data architectures using low-level development tools is technically possible, but time-consuming and error-prone. A more efficient approach to accelerate development and meet the above requirements is to use code generators (generators for short). Most of the development work to create IT systems is repetitive and formalizable, making it perfectly suited for the use of generators.
IT departments have been using generators for years. For example, ETL tools have been used to generate code to copy data from one database to another, BI tools to create SQL statements to access databases, and data modeling tools to generate database structures. The general benefits of generators are: improved flexibility, productivity, maintenance, performance, quality and integrated support for metadata.
Most generators generate only one component of an entire data architecture and not the entire architecture. A class of generators that generate entire data architectures are data warehouse automation tools. The current state is that many of them are focused on developing data platforms that support traditional forms of data consumption, such as a classic data warehouse architecture consisting of a staging area, an enterprise data warehouse, and several physical data marts that support traditional forms of reporting and dashboarding. Some forms of data consumption require other data architectures, such as data lakes and data hubs.
A sustainable data architecture must be able to support a wide range of data consumption forms and must be able to quickly adapt to new forms.
Developing data platforms with sustainable data architectures requires generators capable of creating, adapting, and extending all kinds of data architectures. Such generators are called data architecture automation tools.
This whitepaper describes how the WhereScape data architecture automation tool is able to generate sustainable data architectures. It gives a brief overview of the product and provides a short introduction to how it works with model conversion rules and templates to meet the above requirements.
Almost every organization is trying to adapt to the changing role of data. Data is increasingly seen as a key asset that can improve and optimize business processes and decision-making processes and that allows organizations to expand into new markets. This changed role has led to new requirements for IT systems. Data platforms must process larger volumes of data because more data is being produced, but also the requirements for data consumption are changing and broadening. Providing the organization with simple dashboards and reports that show yesterday’s data is no longer sufficient. New forms of data consumption need to be supported, including data science, real-time dashboards, and self-service BI, and more external parties, such as customers, patients, suppliers, and legislators, must be able to access data. Additionally, requirements for data security and especially data privacy have become more complex.
But it is not only the customer pull that leads to new requirements, it is also the technology push. Many new powerful technologies have become available, such as fast solutions for data storage, processing, and analytics, cloud platforms with virtually unlimited resources and flexible fee structures, and new tools that simplify and accelerate analytics. All these new technologies are not there for technology’s sake, they can make it possible to improve processes or change business models. New technologies create new opportunities.
This section briefly explains two terms frequently used in this whitepaper: data architecture and data platform. Wikipedia defines data architecture as follows: “A data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations.” Most architects will more or less agree with this definition. More practically, it is a document that describes how an organizations produces, stores, processes, secures, consumes, exchanges, and deletes. It describes a data platform. A data platform is the implementation of a data architecture. It is the entire set of IT systems and data stores involved in producing and consuming data. Data warehouse systems and data lake-based systems are examples of data platforms.
To support new data requirements and to benefit from and exploit new technologies, organizations are rethinking and redeveloping the data architectures of their current data platforms. The architectures of new data platforms should not be developed in the traditional way, as they only support existing and known future requirements. The resulting data platforms may be outdated even before they are completed and operational.
It is important that new data platforms are developed in such a way that they can be changed and extended quickly and easily when requirements for data production or data consumption change (again). Nothing in their architectures should be fixed and impractical to change. For example, when new data storage technologies can improve performance or reduce storage costs, the platform must be able to adopt them easily; when data must be migrated to a more scalable cloud platform, it should not require a major redevelopment of the entire platform; stricter data privacy rules must be easy to implement; and when customers need live access to zero-latency data, the data platform must be able to provide it with minimal effort.
Note that in real life situations, most changes are very repetitive by nature. For example, changing the length of a field, replacing one SQL database server by another, and changing a data structure from a snowflake to a stars chema pattern, are all very repetitive tasks. Such tasks are ideal for automation.
It is time for data platforms with sustainable data architectures. A sustainable data architecture is not one that only supports current data requirements and all newly identified requirements, but one that is adaptable and extensible enough to support (as yet unknown) future requirements. This latter is important to deal with completely unexpected and unplanned changes that lead to new requirements. For example, an organization may acquire another company that operates in a different industry with completely different data requirements, or urgent market developments due to business opportunities, disasters, or new aggressive competitors, can also lead to new data requirements.
A sustainable data architecture is one that can survive for a long time, because it is easy to adapt and extend. A sustainable data architecture enables an organization to quickly implement current, known upcoming and still unknown future data requirements; see Figure 1. When data consumption requirements change, the data architecture adapts accordingly without the need for major reengineering and redevelopment exercises. Sustainable data architectures are nimble.
This chapter describes eight key requirements for sustainable data architectures; see also Figure 2.
The definitions of objects, such as tables, fields and keys, can change over time. For example, the data type, uniqueness, integrity rules, maximum length, and optionality of a field can change; fields may have to be added to or dropped from a table; and tables may need to be decomposed into two tables. If such changes occur, databases, ETL programs, reports, and dashboards must be changed accordingly.
If a field occurs in one table of one database, the amount of work involved may be limited. Some fields, however, occur in numerous tables throughout a data platform. For example, the field customer-id may appear in several tables in the staging area, in the enterprise data warehouse, and in several physical data marts. Changing the characteristics of such a field may lead to changes of many columns throughout all the databases of a data platform and to changes of several ETL programs. Although the change to be made is very similar wherever the field occurs and thus very repetitive, implementing all these changes is time consuming and, more importantly, error-prone.
The same is true when the definitions of tables change. Adding columns and decomposing tables is also very repetitive work, and just like changing fields, it is time-consuming and error-prone. A sustainable data architecture should simplify the work involved in changing the definitions of fields, tables, and other objects. Ideally, if the definition of an object changes, the change is automatically dispersed across all its implementations. This improves productivity, time-to-market, and avoids that incorrect or incomplete changes are implemented.
A sustainable data architecture must make a data platform technology independent. This applies to various kinds of technologies, including database servers, messaging busses, and ETL solutions. This simplifies adoption of and migration to faster, better, or cheaper of new and future technologies and it allows organizations to reap their benefits.
For example, replacing a SQL database server with another in a data platform usually requires many SQL statements to be translated into similar statements supported by the other database server. Although this is very repetitive task, it requires a lot manual work and is also error-prone. In a data platform with a sustainable data architecture it must be almost as easy as replacing a laptop with a new one. In other words, a sustainable data architecture supports plug and play with regards to technologies.
It must be easy to replace the runtime platform on which a data platform runs. For example, a sustainable architecture supports easy migration of the entire or parts of a data platform from on premises to a cloud platform (hybrid cloud). In addition, it must be possible to run different parts of one data platform on different cloud platforms from different vendors (multi-cloud). A sustainable data architecture can be distributed across multiple runtime platforms and is designed to simplify changing of the current setup.
A sustainable data architecture enables logically related data to be stored with different databases. For example, the data of a data warehouse or data lake is distributed across multiple physical databases. Distributing the data across multiple databases can improve performance and it allows data to be stored close to geographically distant user groups.
Another reason may be to use different storage technologies to store different types of data. For example, photos, and videos are stored in object storage technology; structured sensor data is stored in an analytical SQL database server; frequently used structured data is stored in an in-memory database server; and, ‘cold’ data is stored in a traditional SQL database server. There is no storage technology that is perfect for every type of data and for every form of data consumption. Heterogeneity of data storage technology has become the new normal.
Most data architectures are based on a specific reference data architecture, such as the logical data warehouse architecture, the classic data warehouse architecture (also known as the corporate information factory), the data lake architecture, and the data mesh. Such a reference architecture may be the best at any given time, but it can change when data production increases, new forms of data consumption are introduced, or when data latency requirements change.
A sustainable data architecture is an architecture that enables organizations to switch to another (reference) data architecture. For example, an architecture with physical data marts may be the ideal solution today to support dashboards and self-service BI, but virtual solutions may be the right solution tomorrow. Or, a data warehouse needs to be replaced by a centralized data lake. A sustainable data architecture is adaptable at this architectural level.
Some components of a data architecture can be designed with different techniques, for example, an enterprise data warehouse can be designed using the data vault or normalization techniques, a physical data mart can be designed as starschemas or snowflake schema, and a data lake can be designed using one or more tiers or zones. Another example relates to data copying and data transformation, an ETL or ELT approach can be used, and data integration can take place before the data is loaded into the data warehouse or when it is loaded into the physical datamarts. These are all examples of architectural design techniques. A sustainable data architecture makes it possible to easily change design techniques when new requirements dictate.
Many technologies store technical and/or business metadata. This metadata must be accessible to IT specialists and business users. The latter group is primarily interested in business metadata. All the metadata must remain available to both groups in a sustainable data architecture, even if technologies for capturing, storing, and exposing metadata change.
History Independent – Many different approaches exist to keep track of historic data. For example, all the historic data can be kept in a separate database, or an approach could be used that is similar to how data vault organizes historical data. A sustainable data architecture should enable organizations to switch approaches when new requirements demand it.
Different approaches exist to develop data platforms with sustainable data architectures. The approach described in this whitepaper is based on code generators (generators for short). Generators transform higher-level specifications to lower-level specifications executed by specific runtime technologies, such as compilers, database servers, messaging products, or ETL tools. If changes need to be made, the high-level specifications are changed and everything or parts are regenerated. All the repetitive work is handled by the generator. In general, working with generators accelerates development (shorter time-to-market), is less error-prone (because generators do not make mistakes) and new requirements are implemented faster.
Generators automate tasks. Automation is not a new concept in the IT industry. In fact, IT is and has always been about automating tasks. IT systems have been developed to automate full or partial administrative, industrial, and technical processes. For example, no organization does their accounting completely by hand anymore, cars are no longer developed by hand, and ordered products are no longer extracted from warehouses manually anymore, but by robots. Nowadays, many products and services would be too expensive if automation is not applied.
Even in the early days, IT was used to speed up tasks or make them more cost effective. For example, to speed up the 1890 USA census, Herman Hollerith1 was asked to create a machine that could speed up the counting of the American population. The previous census of 1880 was done manually and had taken eight years to complete. By automating the process, a ten-year job was reduced to a few months. Another example from long ago was the machine created by Alan Turing2 and his colleagues who worked for the Government Code and Cypher School at Bletchley Park in the United Kingdom during the second World War. This machine was designed to speed up code breaking which it did successfully by automating a highly repetitive process.
The first traditional IT systems were also created to speed up accounting, inventory management, finance, and so on. Nowadays, robot arms are deployed in car factories to automate repetitive manual work. IT is all about automation.
Not every task is suitable for automation and for which suitable generators can be developed. The best tasks suitable for automation are repetitive by nature and can be expressed as formal algorithms. The above examples are all repetitive, such as counting the census, breaking cryptic cyphers, and putting window shields in cars in a factory. Many administrative tasks are also repetitive, such as entering new orders and calculating net wages. All these processes can be formally specified as algorithms that indicate which steps to perform, what to do in special cases and how to react when something goes wrong. In other words, these tasks can be formalized.
Note that artificial intelligence and machine learning make it easier to automate less repetitive and less formalized tasks, such as discovering cracks in dikes by analyzing photos made by drones. This increases the potential set of tasks suited for automation.
As indicated, IT systems are developed to automate tasks, but automation can also be applied to the development of IT systems. In other words, IT tasks such as software design, development, and testing can be automated as well. Especially when they are highly repetitive and formalizable, generators can be developed.
The first generators were probably the compilers for programming languages such as COBOL and Fortran. They were introduced to accelerate development. Rather than having to program in low-level assembler languages, more high-level languages could be used. One language construct in COBOL would replace numerous statements in assembler. In other words, repetitive forms of programming were replaced by simpler constructs. The same was true with the introduction of more modern languages, such as Java and C#. And SQL has replaced many statements in older database languages.
Currently, generators exist for all kinds of IT tasks. For example, all kinds of generators are available for the development of ETL programs and database structures. For example, ETL tools that transform high-level specifications to be transformed to lower-level code to do the actual ETL work have been widely accepted. Or, many advanced BI tools can be considered to be generators because business developers use them to develop their KPIs and dashboards using drag-and-drop features. These tools generate SQL statements or whatever is required to pull the data from databases. Some data science tools also allow scientists to work at a high conceptual level from which code is generated. And the list goes on.
If all these generators had not been developed, all programming would still be done in assembler, or worse in machine code. And business users would not even be able to develop their own dashboards. Note, some generators are technically not compilers but interpreters, but both forms generate code and replace repetitive tasks. But whatever the implementation is of the generator, compiler or interpreter, the goal is the same: improve productivity.
As indicated, the approach described in this whitepaper to develop data platforms with sustainable data architectures is based on generators that transform high-level specifications to lower-level specifications. This chapter lists the benefits of generators and especially generators for data architectures.
The primary benefit of automation is development acceleration. A generator often hides many technical details of the technologies, allowing developers to focus on the functionality that the data platform must support. Also, a generator prevents the wheel from being invented over and over by human developers as they try to develop solutions to similar problems.
Generators create an abstraction layer. Generating code makes the original specifications independent of the run-time platform. A run-time platform consists of an operating system, database server, security system, and so on. Due to the run-time platform independence, the data platform can be generated for other platforms, simplifying a migration; see Figure 3.
No matter how perfect IT systems are, eventually they have to be changed to support new requirements. Because many technical details are hidden by the generator, there is less code or specifications (depending on the system), which makes it easier to implement changes. To take the analogy of assembler languages again, changing code in a C++ program takes much less time than a comparable change to the assembler equivalent.
Generators generate generic code. Some think that generic code is by definition less efficient and slower than handwritten code. Whether this is true depends on the quality of the generator and the manual developer. For example, if a generator always generates standard SQL code regardless of the database server it uses, it may not exploit all the proprietary and performance enhancing features that a database server supports, nor generate the most efficient SQL formulations. Good generators generate specific code for database servers and the most efficient formulation for each target platform.
In fact, the efficiency of handwritten code depends entirely on the expertise level of the developers and experts and it depends on whether they are given the time to create the most efficient code. The technologies for which they can develop efficient handwritten code is also limited.
Developers of generators are given the time to make sure that it generates the most efficient code. This generated code is used by countless organizations that will return comments on how it can be improved. Developers of generators automatically become super experts, which is reflected in the generated code. With each version of the generator, the generated code can also be more efficient. So, generated code can definitely be more efficient than handwritten code.
The correctness of generated code depends on the quality of the generator. But if a specific type of generated code is correct, it always generates correct code. It does not make sudden mistakes by forgetting something or generating the wrong code. It is much more complex for human developers to always produce correct code.
Correct generators can also generate the right metadata. For example, because some generate the entire architecture, they also know how to present a full picture of how all the data is connected. Users can see where the data comes from and how it has been processed and transformed. In other words, a complete and up-to-date lineage or impact analysis can be generated.
As indicated, the quality and efficiency of non-generated code is dependent on the quality of the developer. This approach has certain drawbacks. First, the scalability for development productivity is low, because developers can only work a limited number of hours per day. To increase productivity, more developers must be hired. Second, new developers may need to build up experience before they can write efficient code. Third, when developers change jobs, their organizations lose expertise. All these drawbacks do not apply to generators.
The ability to easily migrate to other technologies makes a data architecture less dependent on the vendor of a technology. Switching to alternative technologies can be limited to a simple regeneration. A vendor dependency that remains is the dependency on the generator itself.
All generator examples described in Chapter 4 produce one component of an entire data platform, such as an application or database. For example, an ETL tool generates ETL programs, BI tools generate SQL statements, and data modeling tools generate data structures. Multiple, independent generators are required to generate an entire data platform. Since the generators require similar specifications, they are defined multiple times, or in other words they are duplicated; see Figure 4. It is a challenge to keep all those specifications consistent, to make sure they work together optimally, and to guarantee that if one specification is changed all the duplicate specifications are changed accordingly.
Many of the tasks involved in designing and developing data architectures are quite repetitive and formalizable. That makes these tasks suitable for automation. For example, when an enterprise data warehouse uses a data vault design technique and the physical data marts use starschemas, both can be generated from a central data model including the ETL code to copy the data from the warehouse to the data marts.
The principles that apply to generators of individual platform components can also be applied to generators of entire data platforms. Currently, the category of generators operating on the architectural level are the data warehouse automation tools. They do not generate code for one component of the architecture, but for several. Traditional data warehouse automation tools generate, for example, staging areas, enterprise data warehouses, physical data marts, the ETL solutions that copy data from one database to another, and metadata. Several of these tools have been on the market for many years and have proven their worth.
A disadvantage of several data warehouse automation tools is that they only generate data platforms with traditional data warehouse architectures. Such platforms are only suitable for a restricted set of data consumption forms. In other words, they are single-purpose data platforms.
To develop truly sustainable data architectures, generators are required that can generate other data architectures in addition to the more traditional data warehouse architectures, such as data lake and data hub architectures. Such generators can be used to generate data platforms for other forms of data consumption than those supported by data warehouse architectures. Note: The term data warehouse automation is probably a misnomer for such tools. Data architecture automation tool might be more appropriate.
WhereScape is a data architecture automation tool that has been on the market since 2002. The product supports two approaches to data warehouse automation: data-driven and model-driven automation. These two approaches roughly correspond to the two products that make up WhereScape: WhereScape 3D and WhereScape RED. The products augment each other and can also be used separately. In the whitepaper the two products are considered one product and are simply referred to as WhereScape.
With the model-driven approach a neutral data model, consisting of entities, attributes and relationships, is defined and stored in the WhereScape repository. These models can be automatically transformed to comparable models with different design techniques. For example, a normalized model can be transformed to a data vault model. Since WhereScape knows the source and target model, it can also generate the logic to copy the data from one model to another.
How a model is transformed is described in detail in so-called model conversion rules; see Figure 5. These rules define in detail how one data model is transformed to another. The rules of a conversion model can be changed and adapted to the needs of an organization. For example, if an organization uses its own, slightly adapted version of data vault, it only needs to change the conversion rules model once. The generator always applies these changed model conversion rules whenever a new data model needs to be transformed. This guarantees that this change is executed consistently throughout the model. Work that would have been highly repetitive, time-consuming, and error-prone is now handled automatically and error-free by the generator.
New model conversion rules can be added. For example, model conversion rules can be defined to transform a normalized model to a data model for a data hub.
WhereScape also uses templates when it generates real code. The templates specify in detail the code to be generated see Figure 6. It supports different types of templates, including DDL (Data Definition Language), DML (Data Manipulation Language), and load templates.
Templates are designed for specific data platforms. For example, there will be a DDL template to generate a CREATE TABLE statement for Oracle and another to generate MySQL equivalents. These templates are probably slightly different because SQL products use slightly different dialects. Some data platforms also support proprietary features that can improve performance or minimize storage. If relevant, these templates use those proprietary features and if not, they can be changed to use them.
DML templates are used to generate code to insert, update, and delete data, and are used, for example, to generate code that moves data from one model to another. They are also used when changes need to be implemented. For example, splitting a table into two tables may require multiple statements to move data from the old table to the two new tables. As with DDL templates, DML templates also take into account the dialect differences and proprietary features of data platforms.
Load templates, as the name suggests, specify the statements to load data into a specific database or file. These generated statements may be specified in the language of the loading tool belonging to a database server or target cloud platform.
The data-driven approach starts with defining the model of a source database within WhereScape. Next, developers specify how that model needs to be transformed into a target model. WhereScape uses these specifications to generate the code required to copy data from the source to the target database. With this data-driven approach, templates are also used to generate real code to create and change databases.
The two approaches do not work in isolation, but augment each other. Models and transformations defined with the model-driven approach can be imported in the data-driven approach. Figure 7 contains a screenshot of a data model created with WhereScape.
Summary – The generator provided by WhereScape uses model conversion rules and templates to transform models and generate databases. The product is not a fixed generator. The model conversion rules and templates can be adapted to the requirements of an organization. When needed, developers can also add their own code that will be used every time the generator is used.
This chapter describes how WhereScape supports the requirements for generating sustainable data architectures as described in Chapter 4.
WhereScape stores the definitions of fields, tables and relationships in its repository. When these definitions change, code is regenerated and redeployed to reflect those changes within the real data platform in which the objects are implemented. Depending on the change, DDL statements, such as ALTER TABLE and DROP TABLE, are generated, as well as DML and Load statements, because data must be unloaded and reloaded, or data must be moved from one table to another. In other words, many templates may be used to generate all the right statements to implement the change.
Important to understand is that almost every type of change ever made to a data model has been made countless times. Without a generator, this would result in repetitive, time-consuming and error prone work. WhereScape automates this entire process. It performs this repetitive work automatically without errors.
WhereScape offers technology independence in many different ways, but primarily through its templates. Technology is ‘hidden’ in the templates. For example, if a table needs to be moved to a different database technology, it is purely a matter of regenerating with a different set of templates and WhereScape creates all the new tables and indexes. Moreover, it also generates the code to copy the data to the new database technology. Besides the migration, if organizations want to change how certain aspects of tables or fields are generated for the new technology, they just need to change the template. Only typical DBA procedures, such as those for backup and recovery and database server tuning, require manual changes.
After this generator and migration process, the data platform continues to operate in exactly the same way and keeps producing the same results.
In the same way that a database server can be replaced, runtime platforms can be replaced. If an organization wants to migrate their on-premises database to a cloud platform, only some minor template changes are required to cater for the specifics of the cloud platform. If they want to exploit some of the new features that are only available for the database server on the cloud platform, then this can also be done through the templates.
WhereScape makes a generated data platform highly runtime platform independent. It enables organizations to easily switch from a fully on-premises solution to a cloud-based solution (multi-cloud or hybrid-cloud).
Metadata can be defined for all the objects defined within WhereScape. Organizations can introduce their own types of metadata tags. For example, a metadata tag can be introduced to indicate with a percentage the estimated quality level of the data in each table. These metadata tags can be used in templates to influence the generated code.
If an organization wants to distribute the tables of a data warehouse across two physical databases using different technologies they can use a specific metadata tag. For example, a metadata tag can be introduced that indicates for each table in which database it needs to implemented. Each time a table is generated, the tag affects the code generated by the template. This enables development of a distributed architecture where, for example, parts of a database run on-premises using a specific database server and the other part runs on a cloud platform using another database server.
New technologies can enable organizations to develop simpler data architectures or data architectures that can support many forms of data consumption. This requires that the current data architecture must be easy to change. For example, physical data marts may have to be replaced by virtual ones, or the staging area and enterprise data warehouse need to be merged into one physical database. The latter change could also mean that the current ETL process that physically copies data from the staging area to the data warehouse is replaced by an ELT process running entirely within the database server that manages both databases.
Implementing such changes with WhereScape is merely a matter of changing the relevant model conversion rules and templates. For example, if physical data marts need to be replaced by virtual ones, instead of generating CREATE TABLE and INSERT statements, WhereScape generates CREATE VIEW statements to develop a virtual data mart. The data models themselves remain unchanged.
The model conversion rules enable switching to other data architectures.
For each database in a data platform a specific design technique for table structures is applied, such as normalization, data vault, anchor, starschema and snowflake. Organizations may also have invented their own design techniques. All of these techniques have their pros and cons, and they may have been selected for a specific database server due to the forms of data consumption. If new forms of data usage need to be supported or if new data storage requirements apply, it may be necessary to change the design technique. For example, it may be necessary to switch from a normalized to a data vault data structure or from a snowflake to a starschema structure.
In WhereScape, switching to another design technique involves changing the model conversion rules and templates followed by a regenerator process.
Organizations can adapt the design technique. For example, in a standard data vault, each hub table has specific columns, such as the source of the data and the start and end time. If they want to add some geo-spatial columns to each hub table, it just needs to be defined in the model conversion rules and templates. But this only needs to be done once and the generator applies those rules and templates consistently every time a hub table is generated.
Metadata can be defined within WhereScape. This can be technical and business metadata. As indicated, organizations can define their own metadata tags. Metadata can affect the generator process, and it can be exported for import in other tools.
There are many ways to store historical data in databases, such as the one used by data vault. If this needs to be changed, only the conversion rules and templates must be changed and the generator process restarted. This could mean regenerating the entire database. This may take a lot of processing time, but it leads to a consistent and error-free result. It also takes a lot less time than manually making those changes for each table individually.
As many roads lead to Rome, different approaches exist to develop data platforms with sustainable data architectures that are future-ready. This whitepaper describes the generator approach deployed by data architecture automation tools. By separating the model definitions from the data platform, development is accelerated and it is easier to change the architecture of the platform and apply changes that enable the platform to support other forms of data consumption. This concept makes data architectures more sustainable. Without sustainable data architectures, making those changes is time-consuming, expensive and error-prone.
Practice has shown that the IT industry has used generators from the early days onwards to improve productivity of repetitive tasks. But generators are only popular within one component of the platform. This has been widely accepted by the market. In fact, some developers don’t even realize that they are using generators anymore.
It is time for the data architecture automation tools. Tools such as WhereScape have been on the market for many years and have proven their worth. If organizations have not studied the latest state of automation tools, it’s time they do.
WhereScape is an adaptable data architecture automation tool. It uses model conversion rules and templates to generate the implementation from data models. This makes the generated data platform independent of definitions, technologies, runtime platforms, distribution, architectures, design techniques, metadata and the way historical data is stored. In other words, the generated architecture is a sustainable data architecture.
Data platforms with a sustainable data architecture support the current, known upcoming and the still unknown future requirements. They enable organizations to adapt more easily to the changing role of data within their organization.