What is Data Modeling & Schema Design?

Design tables, keys, and constraints that keep data correct

Data Modeling & Schema Design - PostgreSQL Tutorial | TechLead

The Goal of Data Modeling

A well-designed schema does more than organize data — it encodes your domain's business rules directly into the database. This means bugs from incorrect data (null where a value is required, duplicate emails, orphaned rows) are caught by the database before they reach your application layer. Good modeling also makes queries simpler and faster because the data is structured in a way that mirrors how you need to retrieve it.

Normalization: Eliminate Redundancy

Normalization is the process of organizing columns and tables to reduce duplicated data. The most common target is Third Normal Form (3NF): each non-key column depends on the primary key, and nothing else.

1NF: Each column holds a single atomic value (no comma-separated lists in a cell).
2NF: All columns depend on the whole primary key (no partial dependencies in composite keys).
3NF: No column depends on another non-key column (no transitive dependencies).

When in doubt: if you find yourself copying the same value into multiple rows, it probably belongs in its own table.

Constraints: Enforcing Business Rules

CREATE TABLE users (
  id         BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email      TEXT   NOT NULL UNIQUE,
  username   TEXT   NOT NULL UNIQUE CHECK (length(username) >= 3),
  role       TEXT   NOT NULL DEFAULT 'member'
                   CHECK (role IN ('member', 'admin', 'moderator')),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  deleted_at TIMESTAMPTZ              -- NULL = active
);

CREATE TABLE posts (
  id         BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id    BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  slug       TEXT   NOT NULL UNIQUE,
  title      TEXT   NOT NULL CHECK (length(title) > 0),
  body       TEXT,
  published  BOOLEAN NOT NULL DEFAULT false,
  published_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Foreign Keys and Referential Integrity

Foreign keys ensure that a referenced row actually exists. The ON DELETE clause controls what happens when the referenced row is deleted:

-- Cascade: delete comments when the post is deleted
post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE

-- Restrict (default): prevent deletion if child rows exist
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT

-- Set NULL: keep the comment but clear the author reference
author_id BIGINT REFERENCES users(id) ON DELETE SET NULL

-- Always index foreign key columns — Postgres does NOT do this automatically
CREATE INDEX idx_posts_user_id ON posts (user_id);
CREATE INDEX idx_comments_post_id ON comments (post_id);

Many-to-Many Relationships

-- Junction table for post tags
CREATE TABLE tags (
  id   BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE post_tags (
  post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  tag_id  BIGINT NOT NULL REFERENCES tags(id)  ON DELETE CASCADE,
  PRIMARY KEY (post_id, tag_id)  -- composite PK prevents duplicates
);

-- Query all tags for a post
SELECT t.name
FROM   post_tags pt
JOIN   tags t ON t.id = pt.tag_id
WHERE  pt.post_id = 42;

Schema Design Best Practices

• Use NOT NULL on every column that must always have a value. Nullable columns should be the exception.
• Add UNIQUE constraints for natural keys like email addresses, slugs, and SKUs.
• Use TIMESTAMPTZ (with timezone) rather than plain TIMESTAMP for all date/time columns.
• Soft-delete with a deleted_at column instead of physically removing rows when you need an audit trail.
• Add indexes on every foreign key column — Postgres never creates them automatically.
• Use CHECK constraints for simple domain validation (minimum length, allowed values).

Data Modeling & Schema Design

The Goal of Data Modeling

Normalization: Eliminate Redundancy

Constraints: Enforcing Business Rules

Foreign Keys and Referential Integrity

Many-to-Many Relationships

Schema Design Best Practices

Continue Learning

Data Engineering

Node.js

REST & APIs

System Design

Python