RED operates with dimension types that are recognized by modern data warehouse theory. When a developer starts working on dimension tables he gets to chose from four options for the default generation of the dimension table and its update procedure via the following dialog box:
A Normal Dimension is where a dimension record is updated and changed whenever any of the non-business key information changes. This is commonly called a type 1 dimension and is used when historical values of the change are not kept.
Overwriting the old values. A Normal dimension type is the most common dimension type. RED auto-generates a stored procedure to handle update and insert paths. Consider this simple example. The "Product" table contains a product named, “Thingamabob” with Product ID being the primary key. In the year 2007, the price of Thingamabob was $150 and over the time, the product's price changes from $150 to $350.
In the year 2007, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Normal Type, there is no way to find out the old value of the product "Thingamabob" in year 2006 since the table now contains only the new price and year information.
The logic that WhereScape RED applies to the stored procedure handling the Normal dimension type works as follows:
- The stored procedure gets all unknown records from the dim table. If an unknown dim record does not exist, then the procedure allows an explicit value to be inserted into IDENTITY field
- The procedure loops through all the input records, attempting to update the dimension records based on the business key value. If the update attempt fails, then the procedure performs an insert.
A Slowly Changing dimension is where new dimension records are created when certain identified columns in the dimension change. This is commonly called a type 2 dimension and tracks point in time historical changes.
Creating another added record. In this type, the old values will not be updated but a new row containing the new values will be inserted in the product table. Consequently, at any point in time, the difference between the old values and new values can be retrieved and easily compared. This can be very useful for reporting purposes.
The problem with the above mentioned data structure is the "Product ID" column cannot store duplicate values for "Thingamabob" since "Product ID" is the business /primary key. Also, the current data structure doesn't clearly specify the effective date and expiration date of the product when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation.
In the changed Product table's Data structure, "Product ID" and "Effective Datetime” are now a composite business / primary key. Therefore, there will be no violation of the primary key constraint. Addition of the new columns, "Effective Datetime" and "Expiration Datetime" provides the information about the Thingamabob’s Effective date and Expiration date adding more clarity and noticeably improving the scope of this table. The slowly changing dimension (SCD), type 2 approach, may need added space in the database, since for every changed record; an extra row has to be stored. Since dimensions are usually not that big in the real world, extra space is negligible.
RED performs the following additional tasks in the stored procedures to handle Slowly Changing Dimensions (SCD), type 2 dimensions.
- Three new special variables are added: insert indicator, version indicator and version number.
- Four extra columns are added to support slowly changing dimensions: current_flag; version; start_date; end_date.
- RED automatically generates an alter table SQL script, based on type. It also adds a unique, business key index to support the slowly changing columns.
- In the update stored procedure loop, if the record is not found, the procedure inserts a new record and sets the insert flag on. Otherwise, the procedure checks to see if we have an existing dimension row and have just had a field change for data that is not part of the slowly changing data that we are interested in and performs an update. If there is an existing dimension row and it verifies that any of slowly-changing fields have changed. The procedure then updates the version indicator and version number, and inserts a new row.
A Previous Values dimension commonly called a type 3 dimension, allows the storing of the last value of selected fields in secondary columns.
Creating new fields. In this dimension type, the latest update to the changed values can be seen. The example shown below illustrates how to add new columns and keep track of the changes. From the example, we can see the current price and the previous price of the product, “Thingamabob”.
The problem with the Previous Values approach is over time if the product price has multiple changes; the complete history may not be stored, only the latest change will be stored. For instance, in year 2008, if the Thingamabob's price changes to $350, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 product information.
- RED guides you through the process of adding secondary columns.
A Date Ranged dimension supports source systems that provide date ranged information. It rolls over time based on specified columns and date range and the history is kept. Otherwise, it looks just like the Slowly Changing dimension type. The difference in that the date range dimension is valid for a given range or set of dates on specified columns.
The following example might be appropriate here. Thingamabobs are on sale for the date range 1/1/08 – 1/31/08 at $150 and on sale for the date range 2/1/08 - 2/29/08 for $200. So the dimensional records would be for 1/1/08…1/31/08, 2/1/08…2/29/08 and 3/1/08 to present each range would be represented by a date range dimension key and the product on sale for that dimension key could be reported on.
- RED will guide you through the process of identifying which column(s) will be date ranged columns, i.e. “from” datetime column followed by the “until” datetime column. E.g. such “from” column could be “Effective Datetime” and “until” column - Expiration Datetime
Note: RED is using slightly different naming convention.
The above table will look like this:
(*) = (Business Key)
(**) = (Primary Artificial Key)
(+) = (Slowly Changing Business Key)
A Get Dimension Key Stored Procedure
The 'Get Key' stored procedure is auto-generated by RED to return an artificial key when a business key is supplied. The normal syntax is to call the function (usually from update_stage_% stored procedure) passing the business key and be returned a status indicating the result of the lookup.
EXEC @v_return_status = get_dim_product_key
,'N' -- Auto add flag
,'N' -- Write error trail entry if not found flag
On a successful lookup, the artificial key for the dimension record is also returned. If the lookup fails because the business key is not found in the dimension then an appropriate status is returned and the artificial key is set to 0.
The main logic as follows:
- Lookup the key: If dss_end_date is NULL, then we are just looking up the current record. If not null then we are trying to match to a specific slowly changing record, so look through all of the records until finding the appropriate one.
- The procedure then loops through all the old keys looking for a match on the date. If key is found then the procedure sets the key and the flag.
- If no match found then the procedure gets the current version.
- If a dimension key is not found, then the procedure will add it if we have the auto_add flag set.
- If requested (auto add log message), the procedure will log a message about the auto add and log a message about the lookup failure.