How Indexes Work
Without an index, Postgres performs a Sequential Scan: it reads every row in the table to find matches. For a table with millions of rows, this is extremely slow. An index is a separate data structure (usually a B-tree) that keeps a sorted copy of one or more column values along with pointers to the corresponding heap pages, allowing the planner to jump directly to matching rows in O(log n) time.
The trade-off: indexes consume disk space and slow down writes (INSERT/UPDATE/DELETE must maintain every index on the table). Only index columns you actually query on.
Index Types
- B-tree (default) — equality, range, sorting. Works with =, <, >, BETWEEN, LIKE ‘foo%’.
- Hash — equality only (=). Slightly smaller than B-tree for simple equality lookups.
- GIN — inverted index for JSONB, arrays, and full-text search (tsvector).
- GiST — generalized search tree for geometry, ranges, and nearest-neighbor queries.
- BRIN — block range index for very large, naturally ordered tables (timestamps on an append-only log).
Creating Indexes
-- Basic B-tree (equality + range)
CREATE INDEX idx_users_email ON users (email);
-- Composite index (order matters: leftmost prefix rule)
CREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);
-- Partial index (only index a subset of rows)
CREATE INDEX idx_users_active ON users (email)
WHERE deleted_at IS NULL;
-- Expression index (index the result of a function)
CREATE INDEX idx_users_lower_email ON users (lower(email));
-- GIN index for JSONB
CREATE INDEX idx_events_payload ON webhook_events USING GIN (payload);
-- GIN index for full-text search
CREATE INDEX idx_posts_search ON posts
USING GIN (to_tsvector(‘english’, title || ‘ ‘ || body));
-- Unique index (also enforces the constraint)
CREATE UNIQUE INDEX idx_products_sku ON products (sku)
WHERE deleted_at IS NULL; -- partial unique constraint
Reading EXPLAIN ANALYZE
Use EXPLAIN ANALYZE to see how Postgres plans and executes a query. The key things to look for are scan type, actual rows vs. estimated rows, and total execution time.
EXPLAIN ANALYZE
SELECT u.name, p.title
FROM posts p
JOIN users u ON u.id = p.user_id
WHERE p.published = true
AND p.created_at > NOW() - INTERVAL ‘7 days’;
Key terms in EXPLAIN output:
- Seq Scan — reading the whole table. Fine for small tables; slow for large ones.
- Index Scan — using an index to find rows, then fetching heap pages. Good.
- Index Only Scan — satisfying the query entirely from the index (no heap fetch). Best.
- Hash Join / Merge Join / Nested Loop — different join strategies; the planner chooses based on row estimates.
- actual time vs. estimated rows — large discrepancies suggest stale statistics (run ANALYZE).
Indexing Checklist
- • Always index foreign key columns (Postgres does not do this automatically).
- • Index columns used in WHERE, JOIN ON, and ORDER BY clauses that appear in slow queries.
- • Use partial indexes to cover just the rows you commonly query (e.g., active users).
- • Use
CREATE INDEX CONCURRENTLYin production to avoid locking the table. - • Regularly check for unused indexes with
pg_stat_user_indexesand drop them. - • Run
ANALYZE table_nameafter bulk loads so the planner has fresh statistics.