TechLead
Lesson 12 of 22
6 min read
Data Engineering

Dimensional Modeling

Master the star schema design pattern with fact tables, dimension tables, and grain definition for analytics-optimized data warehouses

What is Dimensional Modeling?

Dimensional modeling is a data modeling technique designed specifically for analytical querying and reporting. Introduced by Ralph Kimball, it organizes data into fact tables (measurements/metrics) and dimension tables (descriptive context). The resulting structure — called a star schema — is intuitive, performant, and the most widely used pattern in data warehousing.

Dimensional modeling succeeds because it aligns with how business users think about data. Business questions follow the pattern: "Show me [metric] by [dimension]" — for example, "Show me revenue by country by month" or "Show me order count by product category by customer segment." Dimensional models make these queries natural and fast by separating metrics (facts) from descriptive attributes (dimensions).

Fact Tables

Fact tables store quantitative measurements — the numeric values that the business wants to analyze. Each row represents a business event at the lowest level of detail (the grain). Fact tables are typically very large (millions to billions of rows) and narrow (mostly foreign keys and numeric measures).

Types of Fact Tables

  • Transaction Facts: One row per business event (each order, each payment, each click). Most common type. Grain: one row per transaction.
  • Periodic Snapshot Facts: One row per entity per time period (daily account balance, monthly inventory levels). Grain: one row per entity per period.
  • Accumulating Snapshot Facts: One row per entity lifecycle with columns for each milestone (order placed, shipped, delivered). Row is updated as milestones are reached.
  • Factless Facts: Record the occurrence of events without numeric measures (student attendance, product coverage). The fact is that the event happened.
-- Transaction Fact Table: One row per order line item
CREATE TABLE analytics.fct_order_items (
    -- Surrogate key
    order_item_sk     BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,

    -- Degenerate dimension (from the transaction itself)
    order_id          BIGINT NOT NULL,

    -- Foreign keys to dimension tables
    date_key          INT NOT NULL REFERENCES analytics.dim_date(date_key),
    customer_key      INT NOT NULL REFERENCES analytics.dim_customer(customer_key),
    product_key       INT NOT NULL REFERENCES analytics.dim_product(product_key),
    store_key         INT NOT NULL REFERENCES analytics.dim_store(store_key),

    -- Measures (additive facts)
    quantity          INT NOT NULL,
    unit_price        DECIMAL(10,2) NOT NULL,
    discount_amount   DECIMAL(10,2) DEFAULT 0,
    total_amount      DECIMAL(12,2) NOT NULL,  -- quantity * unit_price - discount
    cost_amount       DECIMAL(12,2),           -- cost to the business

    -- Semi-additive or non-additive measures
    margin_pct        DECIMAL(5,2)             -- Non-additive: cannot sum percentages
);

-- Periodic Snapshot Fact Table: One row per product per day
CREATE TABLE analytics.fct_daily_inventory (
    date_key          INT NOT NULL,
    product_key       INT NOT NULL,
    warehouse_key     INT NOT NULL,

    -- Snapshot measures (semi-additive: can sum across products but not across dates)
    quantity_on_hand  INT NOT NULL,
    quantity_reserved INT NOT NULL,
    quantity_available INT NOT NULL,
    days_of_supply    DECIMAL(5,1),

    PRIMARY KEY (date_key, product_key, warehouse_key)
);

Dimension Tables

Dimension tables provide the descriptive context for facts — the "who, what, where, when, why, and how" of business events. They are typically wide (many columns), shallow (thousands to millions of rows), and contain textual attributes and hierarchies used for filtering, grouping, and labeling in reports.

-- Customer Dimension: Rich descriptive attributes
CREATE TABLE analytics.dim_customer (
    customer_key      INT PRIMARY KEY,       -- Surrogate key
    customer_id       BIGINT NOT NULL,       -- Natural/business key
    customer_name     VARCHAR(200) NOT NULL,
    email             VARCHAR(255),

    -- Demographic attributes
    age_group         VARCHAR(20),           -- '18-24', '25-34', etc.
    gender            VARCHAR(20),
    income_bracket    VARCHAR(20),

    -- Geographic hierarchy
    city              VARCHAR(100),
    state             VARCHAR(100),
    country           VARCHAR(100),
    region            VARCHAR(50),           -- 'North America', 'Europe', etc.

    -- Behavioral attributes
    customer_segment  VARCHAR(50),           -- 'Premium', 'Standard', 'New'
    lifetime_value    DECIMAL(12,2),
    first_order_date  DATE,
    acquisition_channel VARCHAR(50),         -- 'organic', 'paid', 'referral'

    -- SCD Type 2 fields (for tracking historical changes)
    effective_from    DATE NOT NULL,
    effective_to      DATE DEFAULT '9999-12-31',
    is_current        BOOLEAN DEFAULT TRUE
);

-- Date Dimension: Pre-computed calendar attributes
CREATE TABLE analytics.dim_date (
    date_key          INT PRIMARY KEY,       -- YYYYMMDD format
    full_date         DATE NOT NULL,
    day_of_week       VARCHAR(10),           -- 'Monday', 'Tuesday', etc.
    day_of_month      INT,
    day_of_year       INT,
    week_of_year      INT,
    month_number      INT,
    month_name        VARCHAR(20),
    quarter           INT,
    year              INT,
    fiscal_quarter    INT,
    fiscal_year       INT,
    is_weekend        BOOLEAN,
    is_holiday        BOOLEAN,
    holiday_name      VARCHAR(100)
);

-- Product Dimension with hierarchy
CREATE TABLE analytics.dim_product (
    product_key       INT PRIMARY KEY,
    product_id        BIGINT NOT NULL,
    product_name      VARCHAR(200) NOT NULL,
    sku               VARCHAR(50),
    brand             VARCHAR(100),

    -- Product hierarchy (drill-down)
    category          VARCHAR(100),          -- 'Electronics'
    subcategory       VARCHAR(100),          -- 'Laptops'
    department        VARCHAR(100),          -- 'Computing'

    unit_cost         DECIMAL(10,2),
    unit_price        DECIMAL(10,2),
    is_active         BOOLEAN DEFAULT TRUE,
    launch_date       DATE,

    effective_from    DATE NOT NULL,
    effective_to      DATE DEFAULT '9999-12-31',
    is_current        BOOLEAN DEFAULT TRUE
);

The Star Schema

The star schema places the fact table at the center with dimension tables radiating outward like the points of a star. Each dimension connects to the fact table through a foreign key. This structure minimizes joins (one hop from fact to any dimension) and is naturally understood by BI tools.

-- Star Schema Query: Revenue by category by region by quarter
-- Simple, intuitive, and fast on columnar warehouses

SELECT
    d.year,
    d.quarter,
    c.region,
    c.customer_segment,
    p.category,
    p.brand,
    COUNT(DISTINCT f.order_id) AS total_orders,
    SUM(f.quantity) AS total_units,
    SUM(f.total_amount) AS total_revenue,
    SUM(f.total_amount - f.cost_amount) AS total_profit,
    AVG(f.total_amount) AS avg_order_value
FROM analytics.fct_order_items f
JOIN analytics.dim_date d ON f.date_key = d.date_key
JOIN analytics.dim_customer c ON f.customer_key = c.customer_key
JOIN analytics.dim_product p ON f.product_key = p.product_key
WHERE d.year = 2025
  AND c.is_current = TRUE
  AND p.is_current = TRUE
GROUP BY d.year, d.quarter, c.region, c.customer_segment, p.category, p.brand
ORDER BY total_revenue DESC;

Defining the Grain

The grain is the most critical decision in dimensional modeling. It defines what a single row in the fact table represents. Getting the grain wrong undermines the entire model. Always declare the grain explicitly before adding any foreign keys or measures.

Grain Examples

  • fct_order_items: One row per order line item (order + product combination)
  • fct_page_views: One row per page view event (user + page + timestamp)
  • fct_daily_revenue: One row per store per day (periodic snapshot)
  • fct_subscriptions: One row per subscription lifecycle (accumulating snapshot with milestones)

Snowflake Schema

A snowflake schema normalizes dimensions into sub-dimensions, reducing redundancy at the cost of additional joins. For example, instead of storing country directly on the customer dimension, you create a separate country dimension. Modern columnar warehouses make the performance difference negligible, so star schemas are generally preferred for their simplicity.

Building Dimensional Models with dbt

-- models/marts/dim_customer.sql
-- dbt model to build the customer dimension

WITH source AS (
    SELECT * FROM {{ ref('stg_customers') }}
),

enriched AS (
    SELECT
        customer_id,
        customer_name,
        email,
        city,
        state,
        country,
        CASE
            WHEN country IN ('US', 'CA', 'MX') THEN 'North America'
            WHEN country IN ('UK', 'DE', 'FR', 'ES', 'IT') THEN 'Europe'
            WHEN country IN ('JP', 'CN', 'IN', 'AU') THEN 'Asia Pacific'
            ELSE 'Other'
        END AS region,
        signup_date AS first_order_date,
        acquisition_channel,
        CURRENT_DATE AS effective_from,
        CAST('9999-12-31' AS DATE) AS effective_to,
        TRUE AS is_current
    FROM source
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['customer_id']) }} AS customer_key,
    *
FROM enriched

Key Takeaways

  • Dimensional modeling separates measurements (fact tables) from descriptive context (dimension tables)
  • The grain defines what one row represents — declare it explicitly before designing anything else
  • Star schemas are the most common and recommended pattern — one fact table surrounded by dimension tables
  • Fact tables are large and narrow (FKs + measures); dimension tables are wide and relatively small (descriptive attributes)
  • The date dimension is always present and should include pre-computed calendar attributes
  • Use surrogate keys (auto-incrementing integers) as primary keys in dimensions, not natural business keys

Continue Learning