Slowly Changing Dimensions (SCD) using Databricks (python +SQL)

Sanajit Ghosh
2 min readAug 10, 2020

Slowly Changing Dimensions (SCD) — These are dimensions records which slowly changes with time or specific periods. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension’s attribute value for given date.

There are many ways to deal with SCD. I have worked on two popular types which is commonly used in most use cases:

  • Type 1— Overwriting the old value
  • Type 2— Creating an addition records and expiring the previous record

Type 1 — Overwriting the old value. In this method no history of dimension records changes are kept in the database. The old dimension value is simply overwritten with the new one.

SCD 1

Type 2 — Creating a new additional record. In this method history of the old record is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. ‘Effective date’ and ‘Row Expiration time’ columns are used in this method. Effective date states the date on which the record was created and expiration date tells the date on which the record was expired .This method, however makes the database operation costly as there can be huge volumes of expired or historical records in the database.

In this example, one can clearly see that on 9th August John’s present address was USA,Florida and on 10th it got changed to NYC. So in type 2, the previous record or row is expired with the date on which the record got changed and a null value is inserted to make sure that it is an active record.

SCD 2

In databricks platform the implementation can be done in both ways. The type 1 is quite straightforward and easy to implement. The logic implemented in type 2 can be done in this below ways. You can refer my github repo for complete code structure.

  1. In python notebook
python notebook

2. In SQL notebook

SQL notebook

--

--