Let’s talk about something that is really painful….type 2 slowly changing dimension. Of course, it can be the best solution in certain cases and yes, it is a very basic data warehousing concept but sometimes, isn’t it a bit “challenging” (to avoid the word painful) for everyone (users included) to maintain and/ or query a slowly changing dimensioned fact table?
Several mechanisms such as snapshot fact tables can “save” the project from them. For that, please refer to the luminous article prepared by Doug Barrett for this blog (SnapShot Fact Tables vs Type 2 dimensions / 20 Dec ’07).
But what can be done when the dimension is already in place? Let’s assume in addition that the users are not very familiar with this type of object and that the new piece of reporting does not require any history on the concerned dimension…
The answer is simple: a dimension view! As close as possible to a type 1 dimension… The goal is to simplify the data and for that, stick to the last known version of the dimension record (as we would do it for a type 1). However, there will be no one-to-one relationship between the surrogate key and the business key components. We still need all surrogate keys per business key to be able to link the data back to the whole fact table.
Once again, it is possible when the context (data, users, requirements…) allows it, but this is typically something that will simplify everybody’s life. Let’s go back to our classic example: A basic sales transactional fact table dimensioned by a type 2 dim_customer that looks like the following table:
- dim_cust_key is the dim surrogate key (key type: 0)Ø cust_number is the primary business key (key type: A)Ø version and status_flag are the added business key (key type: 1)Ø address is the slowly changing column (key type: 3)
All we have to do is create a view with all fields from the customer dimension but valid_from, valid_to dates, status and version. Obviously, we leave the business key blank in the definition. Then, we transform the slowly changing column to make it the last known value (identified by dss_current_flag = ‘Y’ for instance) for each occurrence of the business key. The Oracle / MsSQL statement in the transformation of the column city in our example should looks like this:
Select b.cityfrom dim_customer b where dss_current_flag = ‘Y’ and b.cust_number = dim_customer.cust_number
The result view is as follows:
We have now two levels of information for customers. The dimension view for simple things is ready to help. Join it, grouped by the business key for instance to link all data from the fact table. When reporting is more elaborated, more complex, the Type 2 Dimension table is still to be used for tracking changes properly.
There are different ways to do this kind of thing; we could for instance as well add a “current” field directly in the slowly changing dimension and query (or build the view) from it. The goal is just to simplify the use of the dimension when possible. After that, it is up to the developer to deploy what suits the best depending on the context.