Slowly changing dimensions or SCD (Slowly Changing Dimensions) are dimensions in which their data tend to change over time, either occasional or constant, or involving a single record or the entire table. When these changes occur, you can opt to follow one of these two great options:
- Register the change history.
- Replace the values that are necessary.
Ralph Kimball initially considered three strategies to follow when treating SCD type 1, type 2 and type 3, but through the years the community of people who took care to model databases deepened initial definitions and included several types SCD more, for example, type 4 and type 6.
Below are each strategy type SCD:
- SCD Type 1: Overwrite.
- SCD Type 2: Add a row.
- SCD Type 3: Add column.
- SCD Type 4: Table of separate stories.
- SCD Type 6: Hybrid.
According to the nature of change must select what type SCD is used in some cases be appropriate to combine several techniques.
Importantly, although there are different ways to implement each technique, is essential surrogate keys in dimension tables to apply to apply these techniques.
By applying different techniques SCD, in many cases you must change the structure of the dimension table to which it is working, so these modifications are recommended to model them at the time table, but can also be done once and has been modeled and contains data for which to add such a new column must specify default values to be taken by records in the table.
NOTE: For all the examples below, "product_id" is a surrogate key is primary key of the table used.
SCD Type 1: Overwrite
This guy is the most basic and simple to implement, because while not saving the changes historical, does not require any special modeling and does not need to add new records to the table.
In this case when a record this change in any of the values of their fields, there should be simply to update the data in question, overwriting the old. To illustrate this case, they shall refer the following table:
Product_id | Name | Type | Product |
1 | Name 1 | Type 1 | Product 1 |
Now, it is assumed that this product has changed name, and now become "name2", then get the following:
Product_id | Name | Type | Product |
1 | Name 2 | Type 1 | Product 1 |
Usually this type is used in cases where historical information is not important to maintain, as happens when you must modify the value of a record because it has spelling errors. The example raised is only for practical purposes, because with this technique, all movements performed in "Product 1", which belonged to the "Name 1", now will become the "Name2", which creates a great inconsistency in the DW.
SCD Type 2: Add row
This strategy requires that you add some additional columns to the dimension table for storing the history of changes.
The columns that are added are:
- FechaInicio: date took effect from the current record. By default a date is often used very old, eg "01/01/1000".
- FechaFin: the date on which the current record was no longer in effect. By default a date is often used very futuristic, eg "01/01/9999".
- Revision: sequence number that increases each new change. Default generally begins at "1".
- Current version: specifies whether the current field is the force. This value can be if true: "true" or "1" and if false: "flase" or "0".
Then, when any change occurs in the values of the records, add a new row must be completed and data relating to the history of changes.
To illustrate this case, they shall refer the following table:
Product_id | Name | Type | Product |
1 | Name 1 | Type 1 | Product 1 |
The following columns are added to the record store:
Product_id | Name | Type | Product | Startdate | Enddate | Version | CurrentVersion |
1 | Name 1 | Type 1 | Product 1 | 01/01/1000 | 01/01/9999 | 1 | true |
Now, it is assumed that this product has changed Name, and now become "Name 2", then get the following:
Producto_id | Name | Type | Product | Startdate | Enddate | Version | CurrentVersion |
1 | Name 1 | Type 1 | Product 1 | 01/01/1000 | 06/11/2009 | 1 | false |
2 | Name 2 | Type 1 | Product 1 | 07/11/2009 | 01/01/9999 | 2 | true |
As can be seen, it performs the following process:
- It adds a new row with its corresponding surrogate key ( "product_id").
- Change is recorded ( "Name").
- It updates the values of "Startdate" and "Enddate" both of the new row, as the former (which introduced the change).
- It increases by one field value "Version" that holds the old row.
- It updates the values of "CurrentVersionl" both of the new row, as the old, leaving the new row as the existing one (true).
This technique allows you to save unlimited information exchange.
SCD Type 3: Add Column
This strategy requires that you add to the dimension table an additional column for each column whose values you want to keep track of changes.
To illustrate this case, they shall refer the following table:
Producto_id | Name | Type | Product |
1 | Name 1 | Type 1 | Product 1 |
Then a column is added to maintain the historical changes to data column "Name"
Producto_id | Name | PreviousName | Type | Product |
1 | Name 1 | - | Type 1 | Product 1 |
Now, it is assumed that this product has changed Name, and now become "Name 2", then get the following:
Producto_id | Name | PreviousName | Type | Product |
1 | Name 2 | Name 1 | Type 1 | Product 1 |
As can be seen, it performs the following process:
- In the column "PreviousName" places the old value.
- In the column "Name" the new value is placed in force.
This technique allows you to save a limited information exchange.
SCD Type 4: Table of separate History
This technique is used in combination with one another and their basic function is stored in an additional table the details of historical changes made to a dimension table.
This historical table indicate for example what type of transaction took place (Insert, Update, Delete) on that field and on what date. The goal of maintaining this table is to have a record of all changes, then analyze and make decisions about which SCD technique could be applied better. For example, the table below records the changes historical dimension table "Products", which employs assume the SCD Type 2:
Producto_id | Name_change | Type_Change | Product_Change | Date modified |
1 | Insert | - | - | 05/06/2000 |
2 | Insert | Insert | - | 25/10/2002 |
3 | - | Insert | - | 17/01/2005 |
4 | - | - | Insert | 18/02/2009 |
Taking as example the first record of this table, the information stored there stated:
- The day "05/06/2000", recording the dimension table "Products" with "product_id" equal to "1" had a change of "Name", and therefore should be inserted ( "Insert") a new line with prevailing values.
SCD Type 6: Hybrid
This technique combines the SCD Type 1, 2 and 3.
SCD type is called "6", simply because: 6 = 1 + 2 +3.