When designing a data warehouse, how you handle changes to dimensional data over time is the most important decision to make. It is rare that a dimension will remain static over time. For example, a customer may change their phone number or their address, or a sales person may change their sales territory. For a customer phone number change, it usually not important to track the history of phone numbers a person has – you just overwrite the previous phone number. However, for an address change, you will usually want to track the history of a customer address in order to properly report the history of sales by geography. The same goes for tracking the history of a sales persons sales territory. For example, say in a dimension record a sales person is assigned to Territory A in 2010 and that territory makes $10 million in sales of which the sales person makes a commission. Let’s say Territory B made $5 million in sales. Then in 2011, this sales person is assigned to Territory B. If you just overrode his dimension record and assigned him to Territory B, when doing a sales commission report for 2010 he would show with only $5 million in sales. That would make for one angry sales person. This is why you need to record the history of changes.
The term Slowly Changing Dimension (SCD) is about tracking the variation in dimensional attributes over time. Don’t let the word slowly in this context fool you – the changes could very well happen rapidly. But in general they will happen “slowly” over time.
The way you handle changes falls into three categories:
Type 1 (overwrite): No history information is stored. Existing data is overridden by new values
Type 2 (add a row): The history of data changes is preserved. A new record is inserted each time a change is made. Every data row has a valid from date and valid to date indicating the time period of the data’s validity, and each row usually has as isCurrent type of field that is set to Yes for the active record with the others set to No. When a fact table record is inserted, it will be given the appropriate surrogate key of the dimension record
Type 3 (add a column): This method traces changes using separate columns (but no new rows). This means there is a limit to history preservation based on the number of columns in each row that are designated for storing historical data. For example, a record may have the fields Territory1, Territory1EffectiveDate, Territory2, Territory2EffectiveDate, etc. Type 3 is rarely used
Note that Type 0 (retain original) means a record is never updated – it remains exactly as it was when it was first created. There is actually a Type 4 and a Type 6, and some “combination” types, but I have never seen those used so they are not worth discussing here.