Dimensions that change slowly over time, rather than changing on a regular schedule, time-based. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users to assign proper dimension’s attribute value for a given date. Example of such dimensions could be a customer, geography, employee.
There are many approaches on how to deal with SCD. The most popular are:
- Type 0– Fixed Dimension
- No changes allowed, dimension never changes
- Type 1– No History
- Update record directly, there is no record of historical values, the only current state
- Type 2– Row Versioning
- Track changes as version records with current flag & active dates and other metadata
- Type 3– Previous Value column
- Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur
- Type 4– History Table
- Show current value in dimension table but track all changes in the separate table
- Type 6– Hybrid SCD
- Utilize techniques from SCD Types 1, 2 and 3 to track change
In reality, only types 0, 1 and 2 are widely used, with the others reserved for very specific requirements. Confusingly, there is no SCD type 5 in commonly agreed definitions.
We have a very simple ‘customer’ dimension, with just 2 attributes – Customer Name and Country:
However, Santhi Lakshmi has just informed us that she has now moved to the UK and we want to update our dimension record to reflect this. We can see how the different SCD types will handle this change and the pro/cons of each method.
Our table remains the same. This means our existing reports will continue to show the same figures, maybe it is a business requirement that each customer is always allocated to the country they signed up from.
All future transactions associated with Santhi Lakshmi will also be allocated to the ‘India’ country.
The table is updated to reflect Santhi Lakshmi’s new country:
All fact records associated with Santhi Lakshmi will now be associated with the ‘United Kingdom’ country, regardless of when they occurred.
In order to support type 2 changes, we need to add four columns to our table:
Surrogate Key – The original ID will no longer be sufficient to identify the specific record we require, we, therefore, need to create a new ID that the fact records can join to specifically.
Current Flag – A quick method of returning only the current version of each record
Start Date – The date from which the specific historical version is active
End Date – The date to which the specific historical version record is active
With these elements in place, our table will now look like:
This method is very powerful – you maintain the history for the entire record and can easily perform a change-over-time analysis. However, it also comes with more maintenance overhead, increased storage requirement and potential performance impacts if used on very large dimensions.
Type 2 is the most common method of tracking a change in data warehouses.
Here, we add a new column called “Previous Country” to track what the last value for our attribute was.
Note how this will only provide a single historical value for Country. If the customer changes her name, we will not be able to track it without adding a new column. Likewise, if Santhi Lakshmi moved to a new country again, we would either need to add further “Previous Country” columns or lose the fact that she once lived in India.
There is no change to our existing table here, we simply update the record as if a Type 1 change had occurred. However, we simultaneously maintain a history table to keep track of these changes:
Our Dimension table reads:
Whilst our Type 4 historical table is created as: