What is Planning and Reasoning?

Learn chain-of-thought, tree-of-thought, and self-reflection techniques for advanced agent reasoning

Planning and Reasoning - AI Agents & RAG Tutorial | TechLead

Q: The Role of Planning in AI Agents

Effective AI agents don't just react to inputs — they plan . Planning enables agents to decompose complex tasks, anticipate challenges, consider alternative approaches, and execute multi-step workflows with intention. The quality of an agent's planning directly determines the quality of its outputs.

Q: Tree-of-Thought (ToT)

Tree-of-Thought extends CoT by exploring multiple reasoning paths simultaneously, evaluating each path, and selecting the most promising one. This is particularly useful for problems where the first approach might not be optimal — like puzzle-solving, creative tasks, or strategic planning.

The Role of Planning in AI Agents

Effective AI agents don't just react to inputs — they plan. Planning enables agents to decompose complex tasks, anticipate challenges, consider alternative approaches, and execute multi-step workflows with intention. The quality of an agent's planning directly determines the quality of its outputs.

In this lesson, we explore the key reasoning and planning techniques that make agents more capable: Chain-of-Thought (CoT), Tree-of-Thought (ToT), self-reflection, and iterative re-planning. These techniques can be combined to create agents that reason at a level approaching human problem-solving.

Planning Techniques Overview

Chain-of-Thought (CoT): Sequential step-by-step reasoning
Tree-of-Thought (ToT): Exploring multiple reasoning paths in parallel
Self-Reflection: The agent evaluates and critiques its own outputs
Iterative Re-planning: Updating the plan based on execution results
Decomposition: Breaking complex tasks into smaller, manageable sub-tasks

Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting encourages the model to reason step-by-step before arriving at an answer. This dramatically improves performance on math, logic, and multi-step reasoning tasks. CoT can be elicited with simple phrases like "Think step by step" or through few-shot examples.

// Chain-of-Thought with structured prompting
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function chainOfThought(question: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    system: `You are an analytical reasoning agent. When solving problems:

1. First, identify the key components of the problem
2. Break down the problem into clear steps
3. Work through each step showing your reasoning
4. Verify your answer before stating it
5. Provide a clear final answer

Format:
## Analysis
[Identify key components]

## Step-by-Step Reasoning
[Work through each step]

## Verification
[Check your work]

## Answer
[Final answer]`,
    messages: [{ role: "user", content: question }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

// Zero-shot CoT (simplest form)
async function zeroShotCoT(question: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    messages: [
      {
        role: "user",
        content: `${question}\n\nLet's think through this step by step.`,
      },
    ],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

# Chain-of-Thought in Python
import anthropic

client = anthropic.Anthropic()

def chain_of_thought(question: str) -> str:
    """Use structured CoT prompting for complex reasoning."""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="""You are an analytical reasoning agent. For every problem:
1. Identify the key components
2. Break down into clear steps
3. Show reasoning for each step
4. Verify your answer
5. Provide a clear final answer""",
        messages=[{"role": "user", "content": question}],
    )
    return response.content[0].text

# Few-shot CoT with examples
def few_shot_cot(question: str) -> str:
    """Use examples to demonstrate the reasoning pattern."""
    examples = """
Q: If a store has 3 shelves with 8 items each, and they remove 5 items, how many remain?
A: Let me think step by step.
- Start: 3 shelves x 8 items = 24 total items
- Remove: 24 - 5 = 19 items
- Answer: 19 items remain.

Q: A train travels 120km in 2 hours. If it speeds up by 50%, how long for 180km?
A: Let me think step by step.
- Original speed: 120km / 2h = 60 km/h
- New speed: 60 * 1.5 = 90 km/h
- Time for 180km: 180 / 90 = 2 hours
- Answer: 2 hours.
"""
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[
            {"role": "user", "content": f"{examples}\nQ: {question}\nA: Let me think step by step."}
        ],
    )
    return response.content[0].text

Tree-of-Thought (ToT)

Tree-of-Thought extends CoT by exploring multiple reasoning paths simultaneously, evaluating each path, and selecting the most promising one. This is particularly useful for problems where the first approach might not be optimal — like puzzle-solving, creative tasks, or strategic planning.

// Tree-of-Thought implementation
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

interface ThoughtNode {
  thought: string;
  score: number;
  children: ThoughtNode[];
}

async function treeOfThought(
  problem: string,
  breadth: number = 3,
  depth: number = 3
): Promise<string> {
  // Step 1: Generate multiple initial approaches
  const approaches = await generateThoughts(problem, breadth);

  // Step 2: Evaluate and score each approach
  const scoredApproaches = await evaluateThoughts(problem, approaches);

  // Step 3: Expand the top approaches
  const topApproaches = scoredApproaches
    .sort((a, b) => b.score - a.score)
    .slice(0, 2);

  let bestSolution = "";
  let bestScore = 0;

  for (const approach of topApproaches) {
    // Recursively expand promising branches
    const solution = await expandThought(problem, approach, depth - 1);
    if (solution.score > bestScore) {
      bestScore = solution.score;
      bestSolution = solution.thought;
    }
  }

  return bestSolution;
}

async function generateThoughts(
  problem: string,
  n: number
): Promise<string[]> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    messages: [
      {
        role: "user",
        content: `Problem: ${problem}

Generate exactly ${n} different approaches to solve this problem.
Format each as: APPROACH N: [description]
Be creative — each approach should be meaningfully different.`,
      },
    ],
  });

  const text = response.content[0].type === "text" ? response.content[0].text : "";
  return text.split(/APPROACH \d+:/).filter(Boolean).map(s => s.trim());
}

async function evaluateThoughts(
  problem: string,
  thoughts: string[]
): Promise<ThoughtNode[]> {
  const scored: ThoughtNode[] = [];

  for (const thought of thoughts) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 256,
      messages: [
        {
          role: "user",
          content: `Problem: ${problem}
Approach: ${thought}
Rate this approach from 1-10 for feasibility, correctness, and completeness.
Respond with just the number.`,
        },
      ],
    });

    const scoreText = response.content[0].type === "text" ? response.content[0].text : "5";
    const score = parseInt(scoreText.trim()) || 5;
    scored.push({ thought, score, children: [] });
  }

  return scored;
}

# Tree-of-Thought in Python
import anthropic
from dataclasses import dataclass

client = anthropic.Anthropic()

@dataclass
class ThoughtNode:
    thought: str
    score: float
    children: list

def tree_of_thought(problem: str, breadth: int = 3, depth: int = 3) -> str:
    # Generate multiple approaches
    approaches = generate_thoughts(problem, breadth)

    # Evaluate each approach
    scored = evaluate_thoughts(problem, approaches)

    # Expand the best approaches
    scored.sort(key=lambda x: x.score, reverse=True)
    top_approaches = scored[:2]

    best_solution = ""
    best_score = 0

    for approach in top_approaches:
        solution = expand_thought(problem, approach, depth - 1)
        if solution.score > best_score:
            best_score = solution.score
            best_solution = solution.thought

    return best_solution

def generate_thoughts(problem: str, n: int) -> list[str]:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Problem: {problem}
Generate {n} different approaches to solve this. Format: APPROACH N: [description]"""
        }],
    )
    text = response.content[0].text
    return [s.strip() for s in text.split("APPROACH")[1:] if s.strip()]

def evaluate_thoughts(problem: str, thoughts: list[str]) -> list[ThoughtNode]:
    scored = []
    for thought in thoughts:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=256,
            messages=[{
                "role": "user",
                "content": f"Problem: {problem}\nApproach: {thought}\nRate 1-10. Just the number."
            }],
        )
        score = int(response.content[0].text.strip())
        scored.append(ThoughtNode(thought=thought, score=score, children=[]))
    return scored

Self-Reflection

Self-reflection enables agents to evaluate their own outputs, identify errors, and improve iteratively. The agent acts as both the generator and the critic, creating a feedback loop that converges on better solutions.

// Self-reflection pattern
async function reflectiveAgent(
  task: string,
  maxReflections: number = 3
): Promise<string> {
  let currentSolution = "";

  // Initial attempt
  const initialResponse = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    messages: [{ role: "user", content: task }],
  });
  currentSolution = initialResponse.content[0].type === "text"
    ? initialResponse.content[0].text : "";

  // Reflection loop
  for (let i = 0; i < maxReflections; i++) {
    // Critique the current solution
    const critique = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      messages: [
        {
          role: "user",
          content: `Task: ${task}

Current solution:
${currentSolution}

Critically evaluate this solution. Identify:
1. Errors or inaccuracies
2. Missing information
3. Areas for improvement
4. Overall quality score (1-10)

If the score is 8 or above, respond with "APPROVED".
Otherwise, provide specific feedback for improvement.`,
        },
      ],
    });

    const critiqueText = critique.content[0].type === "text"
      ? critique.content[0].text : "";

    if (critiqueText.includes("APPROVED")) {
      return currentSolution;
    }

    // Revise based on feedback
    const revision = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 2048,
      messages: [
        {
          role: "user",
          content: `Task: ${task}

Previous solution:
${currentSolution}

Feedback:
${critiqueText}

Please revise the solution addressing all the feedback.`,
        },
      ],
    });

    currentSolution = revision.content[0].type === "text"
      ? revision.content[0].text : "";
  }

  return currentSolution;
}

Iterative Re-Planning

Real-world tasks often don't go as planned. Iterative re-planning allows agents to adjust their approach based on what they learn during execution. The agent creates an initial plan, executes steps, and periodically re-evaluates and updates the plan.

# Iterative re-planning agent
import anthropic
import json

client = anthropic.Anthropic()

def replan_agent(task: str, max_replans: int = 3) -> str:
    # Create initial plan
    plan = create_plan(task)
    results = []
    replans = 0

    while plan and replans <= max_replans:
        step = plan.pop(0)

        # Execute the current step
        result = execute_step(step, results)
        results.append({"step": step, "result": result})

        # Check if re-planning is needed
        if should_replan(task, plan, results):
            plan = create_updated_plan(task, results, plan)
            replans += 1

    # Synthesize final answer
    return synthesize_results(task, results)

def create_plan(task: str) -> list[str]:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Create a step-by-step plan for: {task}\nReturn as JSON array of strings."
        }],
    )
    return json.loads(response.content[0].text)

def should_replan(task: str, remaining_plan: list, results: list) -> bool:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"""Task: {task}
Completed: {json.dumps(results[-1] if results else {})}
Remaining plan: {json.dumps(remaining_plan)}
Should we adjust the plan? Answer YES or NO with brief reason."""
        }],
    )
    return "YES" in response.content[0].text.upper()

Planning Best Practices

Use CoT for all reasoning tasks: Even a simple "think step by step" significantly improves accuracy
Reserve ToT for high-stakes decisions: The parallel exploration is expensive — use it where quality matters most
Limit reflection rounds: Diminishing returns after 2-3 rounds; set a quality threshold for early termination
Re-plan sparingly: Only re-plan when execution results materially differ from expectations
Combine techniques: Use CoT within a ReAct loop, add reflection for final answers, re-plan when tools fail

Summary

Planning and reasoning techniques transform simple LLM calls into sophisticated problem-solving systems. Chain-of-Thought provides the foundation, Tree-of-Thought adds exploration breadth, self-reflection adds quality assurance, and re-planning adds adaptability. The best agents combine these techniques judiciously, using the simplest approach that produces reliable results.

Planning and Reasoning

The Role of Planning in AI Agents

Planning Techniques Overview

Chain-of-Thought (CoT) Prompting

Tree-of-Thought (ToT)

Self-Reflection

Iterative Re-Planning

Planning Best Practices

Summary

Continue Learning

AI-Native Engineering

AI & Machine Learning

LangChain

Python

Vercel AI SDK