What is Dimensional Modeling?
Dimensional modeling is a data modeling technique designed specifically for analytical querying and reporting. Introduced by Ralph Kimball, it organizes data into fact tables (measurements/metrics) and dimension tables (descriptive context). The resulting structure — called a star schema — is intuitive, performant, and the most widely used pattern in data warehousing.
Dimensional modeling succeeds because it aligns with how business users think about data. Business questions follow the pattern: "Show me [metric] by [dimension]" — for example, "Show me revenue by country by month" or "Show me order count by product category by customer segment." Dimensional models make these queries natural and fast by separating metrics (facts) from descriptive attributes (dimensions).
Fact Tables
Fact tables store quantitative measurements — the numeric values that the business wants to analyze. Each row represents a business event at the lowest level of detail (the grain). Fact tables are typically very large (millions to billions of rows) and narrow (mostly foreign keys and numeric measures).
Types of Fact Tables
- Transaction Facts: One row per business event (each order, each payment, each click). Most common type. Grain: one row per transaction.
- Periodic Snapshot Facts: One row per entity per time period (daily account balance, monthly inventory levels). Grain: one row per entity per period.
- Accumulating Snapshot Facts: One row per entity lifecycle with columns for each milestone (order placed, shipped, delivered). Row is updated as milestones are reached.
- Factless Facts: Record the occurrence of events without numeric measures (student attendance, product coverage). The fact is that the event happened.
-- Transaction Fact Table: One row per order line item
CREATE TABLE analytics.fct_order_items (
-- Surrogate key
order_item_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
-- Degenerate dimension (from the transaction itself)
order_id BIGINT NOT NULL,
-- Foreign keys to dimension tables
date_key INT NOT NULL REFERENCES analytics.dim_date(date_key),
customer_key INT NOT NULL REFERENCES analytics.dim_customer(customer_key),
product_key INT NOT NULL REFERENCES analytics.dim_product(product_key),
store_key INT NOT NULL REFERENCES analytics.dim_store(store_key),
-- Measures (additive facts)
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
discount_amount DECIMAL(10,2) DEFAULT 0,
total_amount DECIMAL(12,2) NOT NULL, -- quantity * unit_price - discount
cost_amount DECIMAL(12,2), -- cost to the business
-- Semi-additive or non-additive measures
margin_pct DECIMAL(5,2) -- Non-additive: cannot sum percentages
);
-- Periodic Snapshot Fact Table: One row per product per day
CREATE TABLE analytics.fct_daily_inventory (
date_key INT NOT NULL,
product_key INT NOT NULL,
warehouse_key INT NOT NULL,
-- Snapshot measures (semi-additive: can sum across products but not across dates)
quantity_on_hand INT NOT NULL,
quantity_reserved INT NOT NULL,
quantity_available INT NOT NULL,
days_of_supply DECIMAL(5,1),
PRIMARY KEY (date_key, product_key, warehouse_key)
);
Dimension Tables
Dimension tables provide the descriptive context for facts — the "who, what, where, when, why, and how" of business events. They are typically wide (many columns), shallow (thousands to millions of rows), and contain textual attributes and hierarchies used for filtering, grouping, and labeling in reports.
-- Customer Dimension: Rich descriptive attributes
CREATE TABLE analytics.dim_customer (
customer_key INT PRIMARY KEY, -- Surrogate key
customer_id BIGINT NOT NULL, -- Natural/business key
customer_name VARCHAR(200) NOT NULL,
email VARCHAR(255),
-- Demographic attributes
age_group VARCHAR(20), -- '18-24', '25-34', etc.
gender VARCHAR(20),
income_bracket VARCHAR(20),
-- Geographic hierarchy
city VARCHAR(100),
state VARCHAR(100),
country VARCHAR(100),
region VARCHAR(50), -- 'North America', 'Europe', etc.
-- Behavioral attributes
customer_segment VARCHAR(50), -- 'Premium', 'Standard', 'New'
lifetime_value DECIMAL(12,2),
first_order_date DATE,
acquisition_channel VARCHAR(50), -- 'organic', 'paid', 'referral'
-- SCD Type 2 fields (for tracking historical changes)
effective_from DATE NOT NULL,
effective_to DATE DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
-- Date Dimension: Pre-computed calendar attributes
CREATE TABLE analytics.dim_date (
date_key INT PRIMARY KEY, -- YYYYMMDD format
full_date DATE NOT NULL,
day_of_week VARCHAR(10), -- 'Monday', 'Tuesday', etc.
day_of_month INT,
day_of_year INT,
week_of_year INT,
month_number INT,
month_name VARCHAR(20),
quarter INT,
year INT,
fiscal_quarter INT,
fiscal_year INT,
is_weekend BOOLEAN,
is_holiday BOOLEAN,
holiday_name VARCHAR(100)
);
-- Product Dimension with hierarchy
CREATE TABLE analytics.dim_product (
product_key INT PRIMARY KEY,
product_id BIGINT NOT NULL,
product_name VARCHAR(200) NOT NULL,
sku VARCHAR(50),
brand VARCHAR(100),
-- Product hierarchy (drill-down)
category VARCHAR(100), -- 'Electronics'
subcategory VARCHAR(100), -- 'Laptops'
department VARCHAR(100), -- 'Computing'
unit_cost DECIMAL(10,2),
unit_price DECIMAL(10,2),
is_active BOOLEAN DEFAULT TRUE,
launch_date DATE,
effective_from DATE NOT NULL,
effective_to DATE DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
The Star Schema
The star schema places the fact table at the center with dimension tables radiating outward like the points of a star. Each dimension connects to the fact table through a foreign key. This structure minimizes joins (one hop from fact to any dimension) and is naturally understood by BI tools.
-- Star Schema Query: Revenue by category by region by quarter
-- Simple, intuitive, and fast on columnar warehouses
SELECT
d.year,
d.quarter,
c.region,
c.customer_segment,
p.category,
p.brand,
COUNT(DISTINCT f.order_id) AS total_orders,
SUM(f.quantity) AS total_units,
SUM(f.total_amount) AS total_revenue,
SUM(f.total_amount - f.cost_amount) AS total_profit,
AVG(f.total_amount) AS avg_order_value
FROM analytics.fct_order_items f
JOIN analytics.dim_date d ON f.date_key = d.date_key
JOIN analytics.dim_customer c ON f.customer_key = c.customer_key
JOIN analytics.dim_product p ON f.product_key = p.product_key
WHERE d.year = 2025
AND c.is_current = TRUE
AND p.is_current = TRUE
GROUP BY d.year, d.quarter, c.region, c.customer_segment, p.category, p.brand
ORDER BY total_revenue DESC;
Defining the Grain
The grain is the most critical decision in dimensional modeling. It defines what a single row in the fact table represents. Getting the grain wrong undermines the entire model. Always declare the grain explicitly before adding any foreign keys or measures.
Grain Examples
- fct_order_items: One row per order line item (order + product combination)
- fct_page_views: One row per page view event (user + page + timestamp)
- fct_daily_revenue: One row per store per day (periodic snapshot)
- fct_subscriptions: One row per subscription lifecycle (accumulating snapshot with milestones)
Snowflake Schema
A snowflake schema normalizes dimensions into sub-dimensions, reducing redundancy at the cost of additional joins. For example, instead of storing country directly on the customer dimension, you create a separate country dimension. Modern columnar warehouses make the performance difference negligible, so star schemas are generally preferred for their simplicity.
Building Dimensional Models with dbt
-- models/marts/dim_customer.sql
-- dbt model to build the customer dimension
WITH source AS (
SELECT * FROM {{ ref('stg_customers') }}
),
enriched AS (
SELECT
customer_id,
customer_name,
email,
city,
state,
country,
CASE
WHEN country IN ('US', 'CA', 'MX') THEN 'North America'
WHEN country IN ('UK', 'DE', 'FR', 'ES', 'IT') THEN 'Europe'
WHEN country IN ('JP', 'CN', 'IN', 'AU') THEN 'Asia Pacific'
ELSE 'Other'
END AS region,
signup_date AS first_order_date,
acquisition_channel,
CURRENT_DATE AS effective_from,
CAST('9999-12-31' AS DATE) AS effective_to,
TRUE AS is_current
FROM source
)
SELECT
{{ dbt_utils.generate_surrogate_key(['customer_id']) }} AS customer_key,
*
FROM enriched
Key Takeaways
- Dimensional modeling separates measurements (fact tables) from descriptive context (dimension tables)
- The grain defines what one row represents — declare it explicitly before designing anything else
- Star schemas are the most common and recommended pattern — one fact table surrounded by dimension tables
- Fact tables are large and narrow (FKs + measures); dimension tables are wide and relatively small (descriptive attributes)
- The date dimension is always present and should include pre-computed calendar attributes
- Use surrogate keys (auto-incrementing integers) as primary keys in dimensions, not natural business keys