What Are Slowly Changing Dimensions?
Slowly Changing Dimensions (SCDs) describe how to handle changes to dimension table attributes over time. When a customer moves to a new city, when a product's price changes, or when an employee gets promoted, how should the data warehouse reflect this? Should it overwrite the old value, keep a history of all values, or store both? The answer depends on whether historical context matters for your analytics.
Ralph Kimball defined several SCD strategies (Types 0 through 7), but in practice, Types 1, 2, and 3 cover the vast majority of use cases. Understanding when to use each type is essential for building data warehouses that provide accurate historical analysis while remaining maintainable.
SCD Type 0: Retain Original
The dimension attribute never changes once assigned. This is appropriate for truly immutable attributes like a customer's original signup date, a product's launch date, or a transaction's creation timestamp. No special handling is needed — the value is set once and never updated.
SCD Type 1: Overwrite
Type 1 simply overwrites the old value with the new value. No history is preserved. This is the simplest approach and is appropriate when historical values are not analytically important — for example, fixing a misspelled customer name or updating a phone number.
-- SCD Type 1: Overwrite — No history preserved
-- Customer moved from New York to San Francisco
-- Before:
-- | customer_key | customer_id | name | city |
-- | 1 | 501 | Alice | New York |
-- After Type 1 update:
-- | customer_key | customer_id | name | city |
-- | 1 | 501 | Alice | San Francisco |
-- Implementation: Simple UPDATE
UPDATE analytics.dim_customer
SET
city = 'San Francisco',
state = 'California',
updated_at = CURRENT_TIMESTAMP
WHERE customer_id = 501;
-- dbt implementation using incremental model with merge
-- models/marts/dim_customer_type1.sql
{{
config(
materialized='incremental',
unique_key='customer_id',
merge_update_columns=['name', 'email', 'city', 'state', 'country', 'updated_at']
)
}}
SELECT
customer_id,
name,
email,
city,
state,
country,
CURRENT_TIMESTAMP AS updated_at
FROM {{ ref('stg_customers') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Type 1 Trade-offs
- Pros: Simple to implement, no table bloat, always shows current state
- Cons: History is lost — you cannot answer "what city was this customer in when they placed order X?"
- Use When: Correcting data errors, updating non-analytically relevant attributes, or when you genuinely do not care about historical values
SCD Type 2: Add New Row
Type 2 preserves full history by adding a new row for each change, with effective date ranges and a current flag. The original row is marked as no longer current, and a new row is inserted with the updated values. This is the most common and most powerful SCD type, enabling accurate historical analysis.
-- SCD Type 2: Full history with versioned rows
-- Customer moved from New York to San Francisco on 2025-03-15
-- Result: Two rows for the same customer
-- | customer_key | customer_id | name | city | effective_from | effective_to | is_current |
-- | 1 | 501 | Alice | New York | 2024-01-15 | 2025-03-14 | FALSE |
-- | 47 | 501 | Alice | San Francisco | 2025-03-15 | 9999-12-31 | TRUE |
-- Querying current state (most common):
SELECT * FROM analytics.dim_customer WHERE is_current = TRUE;
-- Querying historical state (point-in-time):
-- "What city was customer 501 in on 2025-02-01?"
SELECT *
FROM analytics.dim_customer
WHERE customer_id = 501
AND '2025-02-01' BETWEEN effective_from AND effective_to;
-- Returns: New York (correct!)
-- Joining with facts for historical accuracy:
-- Each order joins to the dimension version that was active at order time
SELECT
o.order_id,
o.order_date,
c.customer_name,
c.city, -- City at the TIME OF THE ORDER, not current city
o.total_amount
FROM analytics.fct_orders o
JOIN analytics.dim_customer c
ON o.customer_key = c.customer_key;
-- The customer_key in the fact table points to the specific version
-- SCD Type 2 Implementation: MERGE statement
MERGE INTO analytics.dim_customer AS target
USING staging.stg_customers AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
-- When a current row has changed attributes, close it and insert new version
WHEN MATCHED AND (
target.city != source.city OR
target.email != source.email OR
target.customer_segment != source.customer_segment
) THEN UPDATE SET
effective_to = CURRENT_DATE - INTERVAL '1 day',
is_current = FALSE
-- New customers: insert directly
WHEN NOT MATCHED THEN INSERT (
customer_id, customer_name, email, city, state, country,
customer_segment, effective_from, effective_to, is_current
) VALUES (
source.customer_id, source.name, source.email, source.city,
source.state, source.country, source.segment,
CURRENT_DATE, '9999-12-31', TRUE
);
-- After MERGE, insert new versions for changed rows
INSERT INTO analytics.dim_customer (
customer_id, customer_name, email, city, state, country,
customer_segment, effective_from, effective_to, is_current
)
SELECT
s.customer_id, s.name, s.email, s.city, s.state, s.country,
s.segment, CURRENT_DATE, '9999-12-31', TRUE
FROM staging.stg_customers s
JOIN analytics.dim_customer t
ON s.customer_id = t.customer_id
WHERE t.is_current = FALSE
AND t.effective_to = CURRENT_DATE - INTERVAL '1 day';
SCD Type 2 with dbt Snapshots
dbt provides built-in support for SCD Type 2 through snapshots. Snapshots automatically track changes and manage effective dates, making Type 2 implementation much simpler:
-- snapshots/snp_customers.sql
{% snapshot snp_customers %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True,
)
}}
SELECT
customer_id,
name AS customer_name,
email,
city,
state,
country,
segment AS customer_segment,
updated_at
FROM {{ source('app', 'customers') }}
{% endsnapshot %}
-- dbt automatically adds:
-- dbt_scd_id (surrogate key)
-- dbt_updated_at
-- dbt_valid_from (effective_from)
-- dbt_valid_to (effective_to, NULL for current)
-- Query the snapshot for current state:
SELECT * FROM snapshots.snp_customers WHERE dbt_valid_to IS NULL;
SCD Type 3: Add New Column
Type 3 tracks limited history by adding columns for the previous value. Instead of adding rows, you add
columns like previous_city and city_changed_date. This preserves only one level
of history and is rarely used in practice because it requires schema changes for each tracked attribute.
-- SCD Type 3: Previous value columns
-- | customer_id | name | city | previous_city | city_changed_on |
-- | 501 | Alice | San Francisco | New York | 2025-03-15 |
ALTER TABLE analytics.dim_customer
ADD COLUMN previous_city VARCHAR(100),
ADD COLUMN city_changed_on DATE;
UPDATE analytics.dim_customer
SET
previous_city = city,
city = 'San Francisco',
city_changed_on = '2025-03-15'
WHERE customer_id = 501;
Choosing the Right SCD Type
| SCD Type | History | Complexity | When to Use |
|---|---|---|---|
| Type 0 | None (never changes) | None | Immutable attributes (signup date) |
| Type 1 | None (overwrite) | Low | Error corrections, non-important attributes |
| Type 2 | Full history | Medium | Analytically important changes (address, segment) |
| Type 3 | One previous value | Low | Rarely used — need both current and previous |
Key Takeaways
- SCD Type 1 overwrites old values — simple but loses history. Use for corrections and non-important attributes.
- SCD Type 2 adds new rows with effective dates — preserves full history. Use for analytically important changes like customer address or segment.
- SCD Type 3 adds columns for previous values — limited history, rarely used in practice.
- Most warehouses use a mix: Type 2 for important attributes (city, segment), Type 1 for corrections (name typos).
- dbt snapshots provide built-in SCD Type 2 support, dramatically simplifying implementation.
- Always use surrogate keys in Type 2 dimensions — the fact table foreign key points to a specific version of the dimension row.