What is Chunking Strategies?

Master fixed-size, semantic, recursive, and parent-child chunking strategies for optimal RAG retrieval

Chunking Strategies - AI Agents & RAG Tutorial | TechLead

Q: Why Chunking Matters

Chunking is the process of splitting documents into smaller pieces for embedding and retrieval. It's arguably the most impactful factor in RAG quality. Poor chunking leads to poor retrieval, and poor retrieval leads to irrelevant or incorrect answers — no matter how good your LLM is.

Why Chunking Matters

Chunking is the process of splitting documents into smaller pieces for embedding and retrieval. It's arguably the most impactful factor in RAG quality. Poor chunking leads to poor retrieval, and poor retrieval leads to irrelevant or incorrect answers — no matter how good your LLM is.

The goal is to create chunks that are semantically coherent (each chunk contains a complete thought or topic), appropriately sized (large enough for context, small enough for precision), and well-overlapped (important information at boundaries isn't lost).

Chunking Strategy Quick Guide

Fixed-Size: Simple, predictable — good baseline for homogeneous documents
Recursive Character: Splits on natural boundaries (paragraphs, sentences) — best general-purpose
Semantic: Uses embeddings to find natural topic boundaries — best quality, higher cost
Parent-Child: Small chunks for retrieval, large chunks for context — best of both worlds
Document-Specific: Markdown headers, code functions, HTML tags — best for structured docs

1. Fixed-Size Chunking

The simplest approach — split text into chunks of a fixed number of characters or tokens, with optional overlap.

// Fixed-size chunking
function fixedSizeChunk(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): string[] {
  const chunks: string[] = [];
  let start = 0;

  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    chunks.push(text.slice(start, end));
    start += chunkSize - overlap;
  }

  return chunks;
}

// Token-based chunking (more accurate for LLMs)
import { encoding_for_model } from "tiktoken";

function tokenBasedChunk(
  text: string,
  maxTokens: number = 512,
  overlapTokens: number = 50
): string[] {
  const encoder = encoding_for_model("gpt-4");
  const tokens = encoder.encode(text);
  const chunks: string[] = [];
  let start = 0;

  while (start < tokens.length) {
    const end = Math.min(start + maxTokens, tokens.length);
    const chunkTokens = tokens.slice(start, end);
    chunks.push(new TextDecoder().decode(encoder.decode(chunkTokens)));
    start += maxTokens - overlapTokens;
  }

  encoder.free();
  return chunks;
}

2. Recursive Character Splitting

The most popular general-purpose strategy. It tries to split on natural boundaries (paragraphs, then sentences, then words) while respecting a maximum chunk size.

# Recursive character splitting with LangChain
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Standard configuration for most documents
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Target chunk size in characters
    chunk_overlap=200,     # Overlap between consecutive chunks
    length_function=len,   # How to measure chunk length
    separators=[
        "\n\n",    # First try: split on double newlines (paragraphs)
        "\n",      # Then: single newlines
        ". ",      # Then: sentences
        ", ",      # Then: clauses
        " ",       # Then: words
        "",        # Last resort: characters
    ],
    is_separator_regex=False,
)

text = """
Machine Learning Fundamentals

Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience. It focuses on developing
algorithms that can access data and use it to learn for themselves.

Types of Machine Learning

Supervised learning uses labeled datasets to train algorithms to classify
data or predict outcomes. The model learns by comparing its output with
the correct answers during training.

Unsupervised learning finds hidden patterns in data without labeled
responses. Clustering and dimensionality reduction are common techniques.
"""

chunks = splitter.split_text(text)
for i, chunk in enumerate(chunks):
    print(f"Chunk {i} ({len(chunk)} chars): {chunk[:80]}...")

3. Semantic Chunking

Semantic chunking uses embeddings to detect topic boundaries. Adjacent sentences with similar embeddings stay together; when similarity drops significantly, a new chunk begins.

# Semantic chunking using embeddings
from sentence_transformers import SentenceTransformer
import numpy as np

def semantic_chunk(
    text: str,
    threshold: float = 0.5,
    min_chunk_size: int = 100,
    max_chunk_size: int = 2000,
) -> list[str]:
    """Split text into semantically coherent chunks."""
    model = SentenceTransformer("all-MiniLM-L6-v2")

    # Split into sentences
    sentences = [s.strip() for s in text.split(". ") if s.strip()]

    if len(sentences) <= 1:
        return [text]

    # Embed all sentences
    embeddings = model.encode(sentences)

    # Calculate cosine similarity between consecutive sentences
    similarities = []
    for i in range(len(embeddings) - 1):
        sim = np.dot(embeddings[i], embeddings[i + 1]) / (
            np.linalg.norm(embeddings[i]) * np.linalg.norm(embeddings[i + 1])
        )
        similarities.append(sim)

    # Find split points where similarity drops below threshold
    chunks = []
    current_chunk = [sentences[0]]

    for i, sim in enumerate(similarities):
        current_text = ". ".join(current_chunk)

        if sim < threshold and len(current_text) >= min_chunk_size:
            chunks.append(current_text + ".")
            current_chunk = [sentences[i + 1]]
        elif len(current_text) >= max_chunk_size:
            chunks.append(current_text + ".")
            current_chunk = [sentences[i + 1]]
        else:
            current_chunk.append(sentences[i + 1])

    if current_chunk:
        chunks.append(". ".join(current_chunk) + ".")

    return chunks

# LangChain also provides SemanticChunker
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

semantic_splitter = SemanticChunker(
    OpenAIEmbeddings(model="text-embedding-3-small"),
    breakpoint_threshold_type="percentile",
    breakpoint_threshold_amount=90,
)

chunks = semantic_splitter.split_text(text)
for chunk in chunks:
    print(f"[{len(chunk)} chars] {chunk[:100]}...")

4. Parent-Child (Hierarchical) Chunking

This strategy creates small chunks for precise retrieval but returns the larger parent chunk (or full document section) for context. You get the precision of small chunks with the context of large ones.

// Parent-child chunking strategy
interface ParentChunk {
  id: string;
  content: string;
  children: ChildChunk[];
}

interface ChildChunk {
  id: string;
  content: string;
  parentId: string;
}

function parentChildChunk(
  text: string,
  parentSize: number = 2000,
  childSize: number = 400,
  childOverlap: number = 50
): { parents: ParentChunk[]; children: ChildChunk[] } {
  const parents: ParentChunk[] = [];
  const children: ChildChunk[] = [];

  // Create parent chunks (large)
  const parentTexts = splitBySize(text, parentSize, 200);

  parentTexts.forEach((parentText, pi) => {
    const parentId = `parent_${pi}`;
    const parent: ParentChunk = {
      id: parentId,
      content: parentText,
      children: [],
    };

    // Create child chunks (small) within each parent
    const childTexts = splitBySize(parentText, childSize, childOverlap);
    childTexts.forEach((childText, ci) => {
      const child: ChildChunk = {
        id: `child_${pi}_${ci}`,
        content: childText,
        parentId,
      };
      parent.children.push(child);
      children.push(child);
    });

    parents.push(parent);
  });

  return { parents, children };
}

// During retrieval: search children, return parents
async function retrieveWithParentContext(
  query: string,
  childCollection: any,
  parentMap: Map<string, string>,
  k: number = 3
): Promise<string[]> {
  // Search over small child chunks (precise matching)
  const results = await childCollection.query({
    queryTexts: [query],
    nResults: k,
  });

  // Return the larger parent chunks (rich context)
  const parentIds = new Set(
    results.metadatas[0].map((m: any) => m.parentId)
  );

  return [...parentIds].map(id => parentMap.get(id) || "");
}

5. Document-Specific Chunking

# Markdown header-based chunking
from langchain.text_splitter import MarkdownHeaderTextSplitter

markdown_text = """
# Introduction
This is the introduction section.

## Background
Here is some background information with important context.

## Methods
### Data Collection
We collected data from multiple sources.

### Analysis
Statistical analysis was performed using Python.

# Results
The results show significant improvement.
"""

headers_to_split = [
    ("#", "h1"),
    ("##", "h2"),
    ("###", "h3"),
]

splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split)
chunks = splitter.split_text(markdown_text)

for chunk in chunks:
    print(f"Headers: {chunk.metadata} -> {chunk.page_content[:60]}...")


# Code-aware chunking
from langchain.text_splitter import Language, RecursiveCharacterTextSplitter

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)

js_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.JS,
    chunk_size=1000,
    chunk_overlap=100,
)

Choosing Chunk Size

Chunk Size Guidelines

Document Type	Chunk Size	Overlap	Strategy
General text	500-1000 chars	100-200	Recursive
Technical docs	1000-1500 chars	200-300	Markdown-aware
Legal/medical	800-1200 chars	200	Semantic
Code files	1000-2000 chars	100	Language-aware
Q&A / FAQ	200-500 chars	0-50	Fixed or by item

Chunking Best Practices

Always include metadata: Store source file, page number, section header, and chunk index with each chunk
Test with real queries: The best chunk size depends on your actual questions — experiment and measure
Use overlap: 10-20% overlap prevents losing information at chunk boundaries
Preserve structure: Don't split in the middle of tables, code blocks, or lists
Consider augmenting chunks: Add a summary or title to each chunk for better embedding quality

Summary

Chunking is the foundation of RAG quality. Start with recursive character splitting as your baseline, measure retrieval quality, then try semantic or parent-child chunking if needed. The right strategy depends on your document types, query patterns, and quality requirements. Always validate with real queries and iterate.

Chunking Strategies

Why Chunking Matters

Chunking Strategy Quick Guide

1. Fixed-Size Chunking

2. Recursive Character Splitting

3. Semantic Chunking

4. Parent-Child (Hierarchical) Chunking

5. Document-Specific Chunking

Choosing Chunk Size

Chunk Size Guidelines

Chunking Best Practices

Summary

Continue Learning

AI-Native Engineering

AI & Machine Learning

LangChain

Python

Vercel AI SDK