Microsoft SQL Server Integration Services (SSIS) is an ETL (Extract Transform and Load) tool that has been built to efficiently extract and transform data in memory on a row by row basis and then load them into a data warehouse database structure. Row-level operations are very flexible, and can also be fast because they are performed in memory. SSIS provides useful data transformation tasks out of the box. However, support staff needs technical knowledge of data warehousing best practices and SSIS behavior to troubleshoot any technical issues. SSIS does not come with any frameworks or data warehousing best practices built-in.
WhereScape RED generates code that is performed by the database itself using best practice frameworks. Data is extracted and loaded very quickly. Simply using bulk operations written into staging tables during intermediate steps of transformation. Then, published into permanent presentation tables like Data Vault (Hubs, Links, and Satellites) or Dimensional Models (Facts and dimensions). The set-based nature of the transforms provides performance, and the use of staging tables between steps makes support and development much simpler as support staff just needs to troubleshoot SQL code and data stored in native database objects.
Both approaches and architectures described above can provide high performance. We have found that experienced data warehouse managers will always choose supportability, the ability to troubleshoot, and enhance over a few percent of processing time. They recognize that user experience is what makes data warehouse successful, and the factors that make the user experience better are:
Initial development of the data warehouse falls into the following main areas:
Whilst SQL Server Integration Services excels at loading data from different sources. Once the data is loaded, WhereScape RED can build all the downstream SQL Server objects such as facts, dimensions, and Microsoft Analysis Services cubes in the database 10 times faster than with traditional, hand-coding methods.
The recommended approach for building SSIS packages is to extract the data from source systems, transform the data in memory using pipeline processing tasks, and then publish the data into presentation tables (dimensions and facts). Minimizing the number of writes to disk. This approach is very much in line with mainstream ETL tools and suits development against a source to target mapping design based on SDLC the requirements gathering approach.
WhereScape RED enables short, fast iterations of working software during the design phase that allows business users and stakeholders to see and experience the actual Data Warehouse during development. WhereScape RED generates the Data Warehouse tables, indexes along with the load routines that populate these objects at the same time. They are managed as one object, which provides huge productivity gains when building and maintaining the Data Warehouse. Thus, reducing technical risk.
WhereScape developers enjoy full technical and user documentation for change management and data lineage.
Once developed, anyone wanting to examine or change an SSIS package uses the graphical interface or Package Explorer to try and track down the process they are maintaining. The actual Data Warehouse design is managed separately using a modeling tool such as Management Studio, Visio, or ER/Studio. SSIS’s lack of integration between database design and process design slows both development and maintenance and increases the risk of error. Iterations take between 2 and 10 times longer using SSIS over WhereScape RED.
The agile development style involves users early in the project, aligns, expectations, and reduces the risk that the data warehouse will not be useful. Consistency of naming standards, indexing strategy, object placement, and code templates are crucial to reduce costs of ongoing support and maintenance. WhereScape RED allows these to be defined upfront and then applies them automatically to all database objects, load code, and documentation.
WhereScape RED’s built-in metadata reports across the entire data warehouse that are essential for larger projects - for example, impact analysis reports, data lineage diagrams, and workflow dependency diagrams.
Once data is loaded into a SQL Server data warehouse, WhereScape RED works within SQL Server’s powerful database engine to transform data and build data warehouse objects within the data warehouse or project mart, in a fraction of the time that it typically takes with traditional methods. WhereScape RED will continue to support the native SQL Server load utilities it has used to import data into the database, such as BULK INSERT, linked servers, and ODBC.
A data warehouse (when it has been developed) is a significant organizational asset, it will change often and will become business-critical. Not only that, but it is likely to have different developers building and maintaining different parts of it over its lifespan. An important function of any Data Warehouse tool should be to reduce the cost and technical risk of ongoing maintenance and enhancements.
Due to the free format of SSIS packages, the only way to implement code consistency is to use external guidelines imposed on the developers. These guidelines are rarely prescriptive enough to save much time when troubleshooting or extending SSIS packages and, naturally, there is no guarantee that, over time, developers will follow guidelines consistently. Task precedence and flow are easy to follow in SSIS packages as they are shown graphically, however crucial object relationships are maintained and shown in other tools. For example, if support wanted to find out which SSIS packages, and which tasks “touch” a table during the data warehouse process. They would need to open each package and examine each task.
The WhereScape RED framework ensures that process flow is easy to implement and follow for troubleshooting or development in the data warehouse. Most, if not all, of the processing logic that loads the Data Warehouse, is automatically generated by WhereScape RED. Coding standards can be predefined in RED so that all generated code conforms to the corporate standard. The entire data processing flow of the Data Warehouse, from source tables to the presentation layer, can be viewed graphically, either interactively in WhereScape RED or in the automatically generated HTML documentation.
SSIS does not generate Metadata that is easy to integrate across the data warehouse. This makes documentation a manual process for the development team to hand over to support staff or end-users. Any documentation that is produced is likely to be out of date when it is needed so the development team would need to check through the actual packages before making any changes.
WhereScape RED is a Metadata driven tool. All data warehouse objects (tables, views, database links, indexes, etc.) are held in Metadata. This Metadata can be exported, deployed, backed up, and documented using the tools supplied with WhereScape RED. Developers can run reports, many built-in, to perform essential tasks such as impact analysis, data lineage diagrams, and workflow dependency diagrams.
WhereScape RED can also generate HTML documentation right across the data warehouse from two different perspectives to support End users and separately to support Technical Support staff. WhereScape RED integrates business Metadata with technical Metadata and this can also be accessed from end-user query tools.
Data warehousing suits an agile project approach. Agile practices are becoming common in data warehouse projects because there is a recognition that users need to be involved during project development to drive requirements and prioritization. The project needs to deliver something with limited resources or time.
Agile practices in data warehousing boil down to:
Building data warehouses in an agile way requires purpose-built software that supports the methodology and process. Doing it any other way becomes difficult real quick. WhereScape RED has been designed specifically to address the challenge of building data warehouses in an agile way. RED supports processing, modeling, workflow, changes, and documentation in a single toolset that covers the entire cycle of data warehouse development. WhereScape RED is a comprehensive data warehouse automation and lifecycle product that scales as large or small as your data warehouse.
Traditional non-agile data warehouse projects do not accept users that do not know what they want at the start of a project. The typical project phases are:
This traditional process is lengthy, requires multiple toolsets and specification phases. It does not easily cope with changing requirements, this leads to well-publicized high failure rates of projects. Typical reasons for project failure are as follows:
For traditionally run projects these problems are not exposed to the users until user acceptance testing which occurs close to the end of the project. By comparison, projects following the best agile approaches supported by WhereScape RED, reduce risk by rapidly closing the feedback loop with the business. So, the project can move forward with confidence.
The data warehouse needs to change frequently. The data warehouse is part of a business intelligence solution that provides feedback on organizational strategy and operations. These change frequently as the organization reacts to internal and external forces. The data warehouse lifecycle is made up of a series of cycles:
Microsoft SQL Server Integration Services (SSIS) and WhereScape RED are very different products, however both can be used to build a data warehouse. It is difficult to purely compare SSIS and WhereScape RED because SSIS would need to be part of a suite of products to adequately compare functionality.
SSIS is a generic data processing platform that can be used to build data integration and workflow solutions, including extraction, transformation, and loading (ETL) operations for data warehousing. WhereScape RED is a complete data warehouse lifecycle product that incorporates data processing together with data modeling, workflow, metadata, and documentation. It is used to build, change, migrate, operate, and document a data warehouse, using one interface, one technology, and one set of skills.
The diagram above shows the scope of SSIS as the “ETL” component - this provides WhereScape customers with another option to load data that is extremely fast, efficient, and flexible. It also shows WhereScape RED covering the entire journey between source systems and the target BI Solution including metadata and documentation. Metadata is crucial in maintaining the relationships between objects and to provide the ability to describe and document the data warehouse.
The same WhereScape agile integrated development environment is used to build SQL Server data warehouses as well as Analysis Services cubes. The benefits are compelling – organizations only need to learn one development environment, leveraging end-to-end metadata from the source system through to the cube layer (and where the supported client tool is, as well). WhereScape delivers an integrated workflow and scheduling for the data warehouse and the cubes, automatically maintained lineage information and documentation from source system extract through to the data warehouse and the cubes.
WhereScape RED makes your users advocates, not critics - Live Prototyping means business users can see what their reports will look like, with real data, sooner than ever before. This means that changes can be made earlier in the process and the final solution more targeted.
While SSIS is a useful tool for extracting and moving data into or out of Microsoft SQL Server, it is not specifically designed for the development and maintenance of Data Warehouses. Since much of the upfront cost of a data warehouse is in the development effort and the longer-term costs in the effort required to maintain the data warehouse. IT Managers should look carefully at the cost-benefit of using WhereScape RED. WhereScape RED is a tool designed from the ground up to build high-quality SQL Server data warehouses. Quickly and efficiently with a lower risk of failure and low TCO.