Common Table Expressions (CTEs)
Common Table Expressions (CTEs) are named, temporary result sets defined with the WITH
keyword. They make complex queries readable, maintainable, and self-documenting by breaking them into logical
steps. CTEs are the foundation of well-written analytical SQL and are used extensively in dbt models,
pipeline transformations, and ad-hoc analysis.
Think of CTEs as creating named "building blocks" that you assemble into a final query. Each CTE can reference CTEs defined before it, creating a clear, top-to-bottom flow of data transformations. This is far more readable than deeply nested subqueries and makes debugging much easier — you can run any CTE independently to inspect its output.
-- CTE basics: Named result sets that compose into a final query
WITH daily_orders AS (
-- Step 1: Aggregate orders by date
SELECT
order_date,
COUNT(DISTINCT order_id) AS order_count,
SUM(total_amount) AS revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM analytics.fct_orders
WHERE order_date >= '2025-01-01'
GROUP BY order_date
),
daily_with_prev AS (
-- Step 2: Add previous day's values for comparison
SELECT
*,
LAG(revenue) OVER (ORDER BY order_date) AS prev_day_revenue,
LAG(order_count) OVER (ORDER BY order_date) AS prev_day_orders
FROM daily_orders
),
final AS (
-- Step 3: Calculate growth metrics
SELECT
order_date,
order_count,
revenue,
unique_customers,
revenue - prev_day_revenue AS revenue_change,
ROUND((revenue - prev_day_revenue) / NULLIF(prev_day_revenue, 0) * 100, 1) AS revenue_growth_pct,
ROUND(revenue / NULLIF(unique_customers, 0), 2) AS revenue_per_customer
FROM daily_with_prev
)
SELECT * FROM final ORDER BY order_date DESC;
CTE Patterns for Data Engineering
The Import-Transform-Export Pattern
This is the standard pattern used in dbt models — import your sources, transform step by step, then select the final result:
-- Import CTEs: pull in source data (one CTE per source)
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
),
customers AS (
SELECT * FROM {{ ref('stg_customers') }}
),
products AS (
SELECT * FROM {{ ref('stg_products') }}
),
-- Transform CTEs: apply business logic step by step
enriched_orders AS (
SELECT
o.order_id,
o.order_date,
o.customer_id,
c.customer_name,
c.segment,
c.country,
o.product_id,
p.product_name,
p.category,
o.quantity,
o.unit_price,
o.total_amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
LEFT JOIN products p ON o.product_id = p.product_id
),
with_metrics AS (
SELECT
*,
SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_cumulative_spend,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS customer_order_number,
CASE
WHEN total_amount > 500 THEN 'high'
WHEN total_amount > 100 THEN 'medium'
ELSE 'low'
END AS value_tier
FROM enriched_orders
),
-- Export CTE: final select
final AS (
SELECT * FROM with_metrics
WHERE total_amount > 0
)
SELECT * FROM final
Cohort Analysis with CTEs
-- Retention cohort analysis: track customer retention by signup month
WITH customer_first_order AS (
-- Find each customer's first order month (cohort assignment)
SELECT
customer_id,
DATE_TRUNC('month', MIN(order_date)) AS cohort_month
FROM analytics.fct_orders
GROUP BY customer_id
),
customer_activities AS (
-- Track each customer's activity months
SELECT DISTINCT
o.customer_id,
c.cohort_month,
DATE_TRUNC('month', o.order_date) AS activity_month,
DATEDIFF('month', c.cohort_month, DATE_TRUNC('month', o.order_date)) AS months_since_first
FROM analytics.fct_orders o
JOIN customer_first_order c ON o.customer_id = c.customer_id
),
cohort_sizes AS (
-- Count customers in each cohort
SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size
FROM customer_first_order
GROUP BY cohort_month
),
retention AS (
-- Calculate retention rate for each cohort at each month
SELECT
a.cohort_month,
a.months_since_first,
COUNT(DISTINCT a.customer_id) AS active_customers,
s.cohort_size,
ROUND(COUNT(DISTINCT a.customer_id)::FLOAT / s.cohort_size * 100, 1) AS retention_pct
FROM customer_activities a
JOIN cohort_sizes s ON a.cohort_month = s.cohort_month
GROUP BY a.cohort_month, a.months_since_first, s.cohort_size
)
SELECT * FROM retention
ORDER BY cohort_month, months_since_first;
Recursive CTEs
Recursive CTEs allow a CTE to reference itself, enabling queries over hierarchical or graph-like data structures. They are essential for traversing org charts, category trees, bill-of-materials, and any parent-child relationships stored in a single table.
-- Recursive CTE: Traverse an organizational hierarchy
-- Table: employees (id, name, title, manager_id)
WITH RECURSIVE org_tree AS (
-- Base case: Start with the CEO (no manager)
SELECT
id,
name,
title,
manager_id,
0 AS level,
name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Find all direct reports
SELECT
e.id,
e.name,
e.title,
e.manager_id,
t.level + 1 AS level,
t.path || ' > ' || e.name AS path
FROM employees e
JOIN org_tree t ON e.manager_id = t.id
)
SELECT
REPEAT(' ', level) || name AS indented_name,
title,
level,
path
FROM org_tree
ORDER BY path;
-- Output:
-- Alice CEO 0 Alice
-- Bob VP Engineering 1 Alice > Bob
-- Charlie Staff Engineer 2 Alice > Bob > Charlie
-- Diana Senior Engineer 2 Alice > Bob > Diana
-- Eve VP Marketing 1 Alice > Eve
-- Frank Marketing Lead 2 Alice > Eve > Frank
Recursive CTE: Category Breadcrumbs
-- Build full category paths from a self-referential categories table
-- Table: categories (id, name, parent_id)
WITH RECURSIVE category_paths AS (
-- Root categories (no parent)
SELECT
id,
name,
parent_id,
name::TEXT AS full_path,
1 AS depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Child categories
SELECT
c.id,
c.name,
c.parent_id,
cp.full_path || ' > ' || c.name AS full_path,
cp.depth + 1
FROM categories c
JOIN category_paths cp ON c.parent_id = cp.id
)
SELECT id, name, full_path, depth
FROM category_paths
ORDER BY full_path;
-- Output:
-- 1 Electronics Electronics 1
-- 5 Computers Electronics > Computers 2
-- 9 Laptops Electronics > Computers > Laptops 3
-- 10 Desktops Electronics > Computers > Desktops 3
-- 6 Audio Electronics > Audio 2
Recursive CTE: Date Spine Generation
-- Generate a continuous date series (useful for filling gaps in time-series)
WITH RECURSIVE date_spine AS (
SELECT DATE '2025-01-01' AS date_day
UNION ALL
SELECT date_day + INTERVAL '1 day'
FROM date_spine
WHERE date_day < DATE '2025-12-31'
)
-- Use the date spine to fill gaps in daily revenue
SELECT
d.date_day,
COALESCE(r.revenue, 0) AS revenue,
COALESCE(r.order_count, 0) AS order_count
FROM date_spine d
LEFT JOIN daily_revenue r ON d.date_day = r.order_date
ORDER BY d.date_day;
Advanced CTE Techniques
-- Materialized CTE hint (supported in some databases)
-- Forces intermediate result to be stored temporarily
WITH customer_orders AS MATERIALIZED (
SELECT customer_id, COUNT(*) AS order_count, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
)
SELECT * FROM customer_orders WHERE order_count > 5
UNION ALL
SELECT * FROM customer_orders WHERE total_spent > 1000;
-- Chained CTEs for funnel analysis
WITH page_views AS (
SELECT DISTINCT user_id FROM events WHERE event_type = 'page_view'
),
add_to_cart AS (
SELECT DISTINCT user_id FROM events WHERE event_type = 'add_to_cart'
),
checkout AS (
SELECT DISTINCT user_id FROM events WHERE event_type = 'checkout'
),
purchase AS (
SELECT DISTINCT user_id FROM events WHERE event_type = 'purchase'
),
funnel AS (
SELECT
(SELECT COUNT(*) FROM page_views) AS viewed,
(SELECT COUNT(*) FROM add_to_cart) AS added,
(SELECT COUNT(*) FROM checkout) AS checked_out,
(SELECT COUNT(*) FROM purchase) AS purchased
)
SELECT
viewed,
added,
ROUND(added::FLOAT / viewed * 100, 1) AS view_to_cart_pct,
checked_out,
ROUND(checked_out::FLOAT / added * 100, 1) AS cart_to_checkout_pct,
purchased,
ROUND(purchased::FLOAT / checked_out * 100, 1) AS checkout_to_purchase_pct,
ROUND(purchased::FLOAT / viewed * 100, 1) AS overall_conversion_pct
FROM funnel;
Key Takeaways
- CTEs make complex queries readable by breaking them into named, logical steps
- Follow the import-transform-export pattern used in dbt: source CTEs first, then transformations, then final select
- Recursive CTEs traverse hierarchical data (org charts, category trees) by referencing themselves
- Date spines fill gaps in time-series data — essential for accurate daily/weekly/monthly reports
- CTEs are not automatically materialized — use explicit MATERIALIZED hints or temp tables for performance-critical intermediate results
- Most dbt models are structured as a series of CTEs — mastering CTEs is mastering dbt