The concept of data warehousing dates back to the late 1980s, but data warehouse elements in some form have been around even longer. Many early fiascos led practitioners to try and de-risk projects by using a requirements-driven, inflexible, sequential, and interdependent process, adapting techniques from the Software Development Life Cycle (SDLC) methodology. As a consequence, many of the current data warehouse tools and techniques are designed around this approach, and consulting firms thrive on these resource-hungry and lengthy projects.
WhereScape supports a different approach. Our tools and methods adapt Agile techniques to mitigate the risk of data warehouse projects through collaboration with the users.
The solution we offer in this article will allow you and your organization to jumpstart your data warehouse projects, utilizing an Agile methodology and providing results for your decision-makers in hours or days, instead of weeks or months.
Problems associated with traditional data warehouse projects
The data warehouse project painfully grapples with various pitfalls, including:
- Inaccurate or outdated business requirements;
- Poor development productivity;
- High Total Cost of Ownership (TCO) and, especially, cost of resources;
- Slow development cycles;
- Unacceptable data quality;
- Dismal documentation.
Most data warehouses utilize the SDLC serial development approach. The traditional approach dictates that structured processes must be followed to ensure success. But this can result in serious issues with a data warehousing project:
- Rigidity (i.e. difficulty to change);
- Failure to respond quickly to ever-changing business, and, as a result,
Data warehousing is experiencing a pattern change
We need a faster and more effective approach to address the current issues with data warehouse projects.
Agile data warehousing is a relatively new approach that treats the entire data warehousing lifecycle as one process, breaking the traditional development methods. This technology includes:
- Source data discovery and analysis;
- Data warehouse schema design for both Foundation and Presentation layers;
- Data transformation and integration;
- Metadata repository;
- Designing data warehouse objects and procedural code;
- Full self-documentation;
- Scheduling the loading routines;
- Data lineage, impact analysis, data traceability, and auditability.
The principles of Agile development are fully applicable to data warehousing projects. The main principles are:
- Frequent collaboration with business users;
- Constant delivery of the working product (i.e. datamart);
- Acceptance of changes, even in later stages of the project;
- Self-organizing, self-supporting teams;
- Complete documentation at delivery;
- “Cycle of Cycles” approach;
- Data warehousing at the speed of business changes;
- Use in an Integrated Development Environment (IDE);
- Test-driven development.
Agile Data Warehousing Project Stages
Stage I: Using WhereScape 3D:
- outline data warehouse requirements and guidelines at a high level during a project initiation meeting, bringing together business experts and developers;
- pinpoint major business artifacts and their relationships for one business area;
- draw the objects on a white board;
- define grain, keys, and some attributes;
- identify data sources for the chosen business area;
- use WhereScape 3D to test the assumption: model the relational diagram of the future datamart, discover and profile source data;
- document the findings using WhereScape 3D’s auto-documenting capability;
- socialize the model with business community.
Stage II: Using WhereScape RED:
- retrofit datamart objects, designed using WhereScape 3D, into WhereScape RED;
- build a presentation layer of the chosen business area (datamart) from existing sources;
- populate the datamart with live production data from data sources;
- provide means of displaying the data to business users such as OLAP cubes, PowerPivot, and the like;
- socialize the data with the developers and the business community;
- allow business users to “play” with the data;
- iterate based on feedback from the business.
Stage III: Using WhereScape RED:
- create an enterprise data warehouse layer;
- build a raw data vault based on the structure and data lineage of the datamart;
- dimensions ? hubs + hub satellites; facts ? links + link satellites;
- re-target source tables from feeding the datamart to feeding the data vault; point data vault tables to populate datamart tables;
- schedule/run the whole process;
- socialize the results with the business community;
- adjust accordingly.
Stage IV: Using WhereScape RED and 3D:
- repeat the process with another business area;
- as changes are required they can be added to the datamart in the development environment, then added to the data vault and deployed;
- extend the data vault with business structures - where data needs to be integrated, joined or has common transformations across multiple business areas, then these business areas can be built into the data vault;
- keep the business community engaged every step of the way;
- document the data warehousing areas as they reach completion;
- automate the process of building data warehouse according to Agile principles.
These deliverables are based on a standard three-day proof of concept, performed onsite using an experienced WhereScape consultant and WhereScape software.
- ER diagrams of the datamart and data vault;
- A datamart populated with (a subset of) production data;
- A data vault designed and built based on the datamart;
- Processes and workflows to populated the data vault and datamart;
- Indexing based on a default indexing strategy;
- Full documentation showing data lineage, technical and business metadata;
- Workshops to demonstrate the datamart to subject matter experts and end user community;
- Workshops to show the processing and code to technical folks.
WhereScape is a leading global provider of data warehouse development productivity software that help organizations build complex solutions that afford them with access to one of their most precious resources - data.
Our family of products is comprised of:
WhereScape 3D: a data warehouse planning and modeling tool. 3D enables you to expose and explore the risk areas of a data project up front, before expectations and budgets have been set.
WhereScape RED: an Agile data warehouse Integrated Development Environment (IDE) that builds, operates, controls, and documents the entire data warehouse.
Contact Info: WhereScape USA
Headquarters: 2100 NW 133rd Place, Suite 76 Portland, OR 97229
Phone: (503) 807-5024
Fax: (503) 466-3978