Data Warehouse Life Cycle Management

WhereScape RED – Data Warehouse Life Cycle Management – The Principles

 >  Design  >  Prototype    >  Deploy    >  Operate    >  Enhance  

Introduction

WhereScape RED is the first lifecycle management software specifically designed for data warehousing.  Lifecycle management means treating data warehousing as a process rather than a project.  An added benefit of taking a process view is that enables data warehouses to be built, and managed, fast.  We recommend a basic lifecycle model that has been proven across many projects in a range of industries.  It covers multiple phases – design, prototype, deployment, operation and enhancement.  WhereScape RED can also be used to support other data warehouse specific lifecycle models that may be in use.

Our white paper “Understanding the Data Warehouse Lifecycle Model” is recommended as further reading. It describes the common problems associated with data warehouse projects when they do not use a lifecycle model approach to development and implementation.

Methodology Principles

WhereScape RED has been developed with a single objective: produce a complete, pragmatic, methodologically aware toolset that allows data warehouse and database professionals to
build, prototype and enhance normalized or dimensional data warehouses rapidly.

In the service of that objective, WhereScape's designers focused on several key principles:

  • Prototype-and-iterate cycles are inevitable, and essential, in the design process. Business decision-makers often cannot articulate precisely what data elements they need to make business decisions, either because they have never seen the data elements available to them, or because the artificial (and often rote) nature of requirements-gathering sessions fail to spark their interest and elicit meaningful responses. As such - and as Inmon predicted more than a decade ago - data warehouse designers inevitably have to build prototypes of data warehouse environments in order to gather substantive feedback from user communities, and often have to prototype-and-iterate three or four times in order to arrive at a data warehouse design that is user-optimized and certain of acceptance by user communities.
  • Dimensional data warehouse layers are a fixture of modern data warehouse design. Although some cutting-edge analytical environments are now largely or wholly automated, using code that can navigate complex normalized schema to perform analyses, the vast majority of data warehouse and decision support environments are used by human beings, manipulating data in commercial off-the-shelf query and reporting tools. Dimensional data models - also known as star schema – and cubes are essential for these users, since these schema provide the inherent legibility and navigability that business decision-makers require to formulate questions quickly, and receive accurate answers from the data warehouse.
  • Normalized data warehouse models are a prerequisite for large scale enterprise data warehouses. Users think dimensionally, but storing data dimensionally, unless tightly managed, can lead to multiple copies of conflicting data. Consolidating data into its base components in the form of a data model that reflects the key business entities (product, customer, orders etc) provides a scalable architecture that can be used to feed the dimensional layer providing both ease of management and ease of use. 
  • Protect the transactional source systems from the data warehouse and from business decision-makers. Transactional source systems, because they are more accessible than ever before, invite the data warehouse designer to lean on them: to place code on transactional server platforms, to pass queries through the data warehouse layer to the production system when transaction-level detail is required, to employ wasteful drop-and-reload data warehouse population models that tax the processing capacity of transactional servers. But data warehouses exist, architecturally, to protect transactional systems from high levels of unpredictable query traffic, and a properly-built data warehouse environment should make minimal demands on transactional source systems, moving all and only the data required for decision-making onto the data warehouse in the most efficient manner possible.
  • Protect the data warehouse from the end-user tools environment. A properly-designed data warehouse environment becomes the decisional system of record for a department, organization or enterprise, capable of meeting the demands of literally hundreds of different decision-making processes, and of supporting the functionality and needs of all the decision-making tools present in the typical organization, from spreadsheets and personal databases to the most sophisticated business intelligence and statistical analysis tools. In no case should a data warehouse design and development toolset allow the peculiarities of a particular tool to impinge on, direct or hijack the data warehouse's logical design.
  • Avoid an obsessive focus on extraction, transformation and load (ETL) scenarios. A decade ago, the single largest problem for data warehouse practitioners was extraction, transformation and loading (ETL) - getting data out of inscrutable, non-relational transactional source systems, and into (often non-relational) data warehouse data bases. As a result, traditional data warehouse 'design' tools focus almost exclusively on this problem, treating design, deployment, operations and enhancement issues as afterthoughts if they dealt with these areas at all. In some heterogeneous environments data management and movement is still very much an issue, and WhereScape RED rapidly builds data warehouses following data movement through ETL or EAI oriented architectures. But merchant relational databases from Oracle, Microsoft, Teradata and other vendors, coupled with the standardization of operating systems around UNIX and Win32 variants, have undercut traditional ETL tools' strengths - heterogeneity. Every day, data warehouse professionals are operating in a world that is increasingly homogeneous - merchant SQL databases running on near-commodity operating systems - and should therefore be demanding that their data warehouse tools support the way these professionals work now, not the way they worked in the early 1990s.
  • Manage what you design, where you design it. Every seasoned data warehouse or business intelligence practitioner has had the experience of trying to design a data warehouse environment in one set of tools, 'port' that design to a separate and disconnected physical implementation, and then manage the physical implementation using yet a third set of tools. The results speak for themselves: technical failure rates for data warehouses, more than a decade after the founding of the discipline, still top 50% in nearly every study done on the subject. While it is true that data warehouses fail for multiple reasons, one of those reasons is certainly the lack of an integrated design-construct-operate (DCO) environment for data warehouse professionals to leverage in their work.
  • Assume success means enhancement, and that enhancements happen continually and perpetually. It was once common wisdom that data warehouses were static: once built, the logical design at the heart of the data warehouse would, essentially, remain unchanged. The irony of this assumption is startling - having complained about the 'data vault' nature of transactional systems for more than a decade, data warehouse practitioners the world over were building yet more data vaults: static data warehouse and business intelligence repositories that refused to recognize that, as decision-makers become more proficient at making data-driven business decisions, their informational requirements both become more refined, and migrate over time toward higher-added-value decisions. Today, every data warehouse environment we build must be built on the fundamental assumption that it will be repeatedly redeveloped and redeployed - perpetual enhancement - for as long as the decision-making processes it supports are relevant to the organization, and that the redeployment cycle, in today's fast-paced, closely-watched, highly-regulated business environment, will often be measured in days or weeks, rather than in months or years. This requires a focus on the data warehouse lifecycle not just the project.
  • To succeed in data warehousing projects, define a simple methodology, underpin it with process, and employ it repeatedly. Two decades of the rise, fall and rise again of CASE and RAD tools ought to have taught us, collectively, two things: 
    • methodology and process are essential for project success; and
    • lightweight methodologies are followed, while heavyweight methodologies are either ignored, or stifle innovation and rapid project completion.