Menu Request Demo

SnapShot Fact Tables vs Type 2 dimensions

22 December 2007

SnapShot fact tables are marvellous things for tracking changes over time.  I often use them instead of type 2 dimensions which are too hard to business people to understand (and a pain to maintain) - take the example of a customer dimension, where the business users have said that they want to track attribute(s) over time (for example a customer category).  If these attributes are fairly static then a type 2 dimension would be OK, but if it changes relatively frequently (more than once a year) or the business actually want to report on changes / trends over time then a snapshot fact table is often the better solution - in that case I would have a type 1 customer dimension, a customer snapshot fact table and a customer category dimension table.

The grain of the fact table would be each customer per snapshot date (perhaps monthly).  Easy to understand and very easy to track changes over time - think of the SQL to do a trend of customers by category over time using just a type 2 customer dimension - possible, but yuk!

Where this approach is not so attractive is if there are tens of millions of dimensional members (eg customers) or if the changes need to be tracked at a daily level - in this case the fact table might get unpalatably large quite quickly.  For example 5 years of history on 10 million customers leads to a fact table of 5 * 12 * 10mln = 600mln.  Not absurd, but you would want to use partitioning, and makesure that your fact table is pretty tight (just keys and measures).  If the fact table is likely to get too large then slowly changing logic in the fact or dim is going to save space (it will still be a pain to report on though).

It would be pretty easy to build up a snapshot fact table from scratch over time by adding the current snapshot once a month.  To populate the snapshot fact table grain from a historical record (eg from an ODS) using a start and end date I often use a manual join to the date dimension (add the dim_date_key to the stage table but mark it as a manual dimension join in the key properties), then use a join like the following in the procedure build:

FROM load_customer
JOIN dim_date
ON dim_date.calendar_date BETWEEN load_customer.effective_from_date AND load_customer.effective_to_date
AND dim_date.cal_day_in_month = 1

This ensures that I get a record in the fact table for each customer for each month using generated code.  In RED I have to change the source table in the column properties of the dim_date_key to be the date dimension to ensure the meta data is correct downstream for documentation and for building OLAP cubes.


Comments 0

Leave a Comment

No comments.