What is Slowly Changing Dimensions (SCD)?

Learn SCD Type 1, 2, and 3 strategies for tracking historical changes in dimension tables

Slowly Changing Dimensions (SCD) - Data Engineering Tutorial | TechLead

What Are Slowly Changing Dimensions?

Slowly Changing Dimensions (SCDs) describe how to handle changes to dimension table attributes over time. When a customer moves to a new city, when a product's price changes, or when an employee gets promoted, how should the data warehouse reflect this? Should it overwrite the old value, keep a history of all values, or store both? The answer depends on whether historical context matters for your analytics.

Ralph Kimball defined several SCD strategies (Types 0 through 7), but in practice, Types 1, 2, and 3 cover the vast majority of use cases. Understanding when to use each type is essential for building data warehouses that provide accurate historical analysis while remaining maintainable.

SCD Type 0: Retain Original

The dimension attribute never changes once assigned. This is appropriate for truly immutable attributes like a customer's original signup date, a product's launch date, or a transaction's creation timestamp. No special handling is needed — the value is set once and never updated.

SCD Type 1: Overwrite

Type 1 simply overwrites the old value with the new value. No history is preserved. This is the simplest approach and is appropriate when historical values are not analytically important — for example, fixing a misspelled customer name or updating a phone number.

-- SCD Type 1: Overwrite — No history preserved
-- Customer moved from New York to San Francisco

-- Before:
-- | customer_key | customer_id | name  | city          |
-- | 1            | 501         | Alice | New York      |

-- After Type 1 update:
-- | customer_key | customer_id | name  | city          |
-- | 1            | 501         | Alice | San Francisco |

-- Implementation: Simple UPDATE
UPDATE analytics.dim_customer
SET
    city = 'San Francisco',
    state = 'California',
    updated_at = CURRENT_TIMESTAMP
WHERE customer_id = 501;

-- dbt implementation using incremental model with merge
-- models/marts/dim_customer_type1.sql
{{
    config(
        materialized='incremental',
        unique_key='customer_id',
        merge_update_columns=['name', 'email', 'city', 'state', 'country', 'updated_at']
    )
}}

SELECT
    customer_id,
    name,
    email,
    city,
    state,
    country,
    CURRENT_TIMESTAMP AS updated_at
FROM {{ ref('stg_customers') }}

{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Type 1 Trade-offs

Pros: Simple to implement, no table bloat, always shows current state
Cons: History is lost — you cannot answer "what city was this customer in when they placed order X?"
Use When: Correcting data errors, updating non-analytically relevant attributes, or when you genuinely do not care about historical values

SCD Type 2: Add New Row

Type 2 preserves full history by adding a new row for each change, with effective date ranges and a current flag. The original row is marked as no longer current, and a new row is inserted with the updated values. This is the most common and most powerful SCD type, enabling accurate historical analysis.

-- SCD Type 2: Full history with versioned rows
-- Customer moved from New York to San Francisco on 2025-03-15

-- Result: Two rows for the same customer
-- | customer_key | customer_id | name  | city          | effective_from | effective_to | is_current |
-- | 1            | 501         | Alice | New York      | 2024-01-15     | 2025-03-14   | FALSE      |
-- | 47           | 501         | Alice | San Francisco | 2025-03-15     | 9999-12-31   | TRUE       |

-- Querying current state (most common):
SELECT * FROM analytics.dim_customer WHERE is_current = TRUE;

-- Querying historical state (point-in-time):
-- "What city was customer 501 in on 2025-02-01?"
SELECT *
FROM analytics.dim_customer
WHERE customer_id = 501
  AND '2025-02-01' BETWEEN effective_from AND effective_to;
-- Returns: New York (correct!)

-- Joining with facts for historical accuracy:
-- Each order joins to the dimension version that was active at order time
SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    c.city,          -- City at the TIME OF THE ORDER, not current city
    o.total_amount
FROM analytics.fct_orders o
JOIN analytics.dim_customer c
    ON o.customer_key = c.customer_key;
-- The customer_key in the fact table points to the specific version

-- SCD Type 2 Implementation: MERGE statement
MERGE INTO analytics.dim_customer AS target
USING staging.stg_customers AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE

-- When a current row has changed attributes, close it and insert new version
WHEN MATCHED AND (
    target.city != source.city OR
    target.email != source.email OR
    target.customer_segment != source.customer_segment
) THEN UPDATE SET
    effective_to = CURRENT_DATE - INTERVAL '1 day',
    is_current = FALSE

-- New customers: insert directly
WHEN NOT MATCHED THEN INSERT (
    customer_id, customer_name, email, city, state, country,
    customer_segment, effective_from, effective_to, is_current
) VALUES (
    source.customer_id, source.name, source.email, source.city,
    source.state, source.country, source.segment,
    CURRENT_DATE, '9999-12-31', TRUE
);

-- After MERGE, insert new versions for changed rows
INSERT INTO analytics.dim_customer (
    customer_id, customer_name, email, city, state, country,
    customer_segment, effective_from, effective_to, is_current
)
SELECT
    s.customer_id, s.name, s.email, s.city, s.state, s.country,
    s.segment, CURRENT_DATE, '9999-12-31', TRUE
FROM staging.stg_customers s
JOIN analytics.dim_customer t
    ON s.customer_id = t.customer_id
WHERE t.is_current = FALSE
  AND t.effective_to = CURRENT_DATE - INTERVAL '1 day';

SCD Type 2 with dbt Snapshots

dbt provides built-in support for SCD Type 2 through snapshots. Snapshots automatically track changes and manage effective dates, making Type 2 implementation much simpler:

-- snapshots/snp_customers.sql
{% snapshot snp_customers %}

{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at',
        invalidate_hard_deletes=True,
    )
}}

SELECT
    customer_id,
    name AS customer_name,
    email,
    city,
    state,
    country,
    segment AS customer_segment,
    updated_at
FROM {{ source('app', 'customers') }}

{% endsnapshot %}

-- dbt automatically adds:
-- dbt_scd_id (surrogate key)
-- dbt_updated_at
-- dbt_valid_from (effective_from)
-- dbt_valid_to (effective_to, NULL for current)

-- Query the snapshot for current state:
SELECT * FROM snapshots.snp_customers WHERE dbt_valid_to IS NULL;

SCD Type 3: Add New Column

Type 3 tracks limited history by adding columns for the previous value. Instead of adding rows, you add columns like previous_city and city_changed_date. This preserves only one level of history and is rarely used in practice because it requires schema changes for each tracked attribute.

-- SCD Type 3: Previous value columns
-- | customer_id | name  | city          | previous_city | city_changed_on |
-- | 501         | Alice | San Francisco | New York      | 2025-03-15      |

ALTER TABLE analytics.dim_customer
    ADD COLUMN previous_city VARCHAR(100),
    ADD COLUMN city_changed_on DATE;

UPDATE analytics.dim_customer
SET
    previous_city = city,
    city = 'San Francisco',
    city_changed_on = '2025-03-15'
WHERE customer_id = 501;

Choosing the Right SCD Type

SCD Type	History	Complexity	When to Use
Type 0	None (never changes)	None	Immutable attributes (signup date)
Type 1	None (overwrite)	Low	Error corrections, non-important attributes
Type 2	Full history	Medium	Analytically important changes (address, segment)
Type 3	One previous value	Low	Rarely used — need both current and previous

Key Takeaways

SCD Type 1 overwrites old values — simple but loses history. Use for corrections and non-important attributes.
SCD Type 2 adds new rows with effective dates — preserves full history. Use for analytically important changes like customer address or segment.
SCD Type 3 adds columns for previous values — limited history, rarely used in practice.
Most warehouses use a mix: Type 2 for important attributes (city, segment), Type 1 for corrections (name typos).
dbt snapshots provide built-in SCD Type 2 support, dramatically simplifying implementation.
Always use surrogate keys in Type 2 dimensions — the fact table foreign key points to a specific version of the dimension row.

Slowly Changing Dimensions (SCD)

What Are Slowly Changing Dimensions?

SCD Type 0: Retain Original

SCD Type 1: Overwrite

Type 1 Trade-offs

SCD Type 2: Add New Row

SCD Type 2 with dbt Snapshots

SCD Type 3: Add New Column

Choosing the Right SCD Type

Key Takeaways

Continue Learning

Python

PostgreSQL

System Design

Cloud & Kubernetes

AI Agents & RAG