The Role of Planning in AI Agents
Effective AI agents don't just react to inputs — they plan. Planning enables agents to decompose complex tasks, anticipate challenges, consider alternative approaches, and execute multi-step workflows with intention. The quality of an agent's planning directly determines the quality of its outputs.
In this lesson, we explore the key reasoning and planning techniques that make agents more capable: Chain-of-Thought (CoT), Tree-of-Thought (ToT), self-reflection, and iterative re-planning. These techniques can be combined to create agents that reason at a level approaching human problem-solving.
Planning Techniques Overview
- Chain-of-Thought (CoT): Sequential step-by-step reasoning
- Tree-of-Thought (ToT): Exploring multiple reasoning paths in parallel
- Self-Reflection: The agent evaluates and critiques its own outputs
- Iterative Re-planning: Updating the plan based on execution results
- Decomposition: Breaking complex tasks into smaller, manageable sub-tasks
Chain-of-Thought (CoT) Prompting
Chain-of-Thought prompting encourages the model to reason step-by-step before arriving at an answer. This dramatically improves performance on math, logic, and multi-step reasoning tasks. CoT can be elicited with simple phrases like "Think step by step" or through few-shot examples.
// Chain-of-Thought with structured prompting
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function chainOfThought(question: string): Promise<string> {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
system: `You are an analytical reasoning agent. When solving problems:
1. First, identify the key components of the problem
2. Break down the problem into clear steps
3. Work through each step showing your reasoning
4. Verify your answer before stating it
5. Provide a clear final answer
Format:
## Analysis
[Identify key components]
## Step-by-Step Reasoning
[Work through each step]
## Verification
[Check your work]
## Answer
[Final answer]`,
messages: [{ role: "user", content: question }],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}
// Zero-shot CoT (simplest form)
async function zeroShotCoT(question: string): Promise<string> {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [
{
role: "user",
content: `${question}\n\nLet's think through this step by step.`,
},
],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}
# Chain-of-Thought in Python
import anthropic
client = anthropic.Anthropic()
def chain_of_thought(question: str) -> str:
"""Use structured CoT prompting for complex reasoning."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="""You are an analytical reasoning agent. For every problem:
1. Identify the key components
2. Break down into clear steps
3. Show reasoning for each step
4. Verify your answer
5. Provide a clear final answer""",
messages=[{"role": "user", "content": question}],
)
return response.content[0].text
# Few-shot CoT with examples
def few_shot_cot(question: str) -> str:
"""Use examples to demonstrate the reasoning pattern."""
examples = """
Q: If a store has 3 shelves with 8 items each, and they remove 5 items, how many remain?
A: Let me think step by step.
- Start: 3 shelves x 8 items = 24 total items
- Remove: 24 - 5 = 19 items
- Answer: 19 items remain.
Q: A train travels 120km in 2 hours. If it speeds up by 50%, how long for 180km?
A: Let me think step by step.
- Original speed: 120km / 2h = 60 km/h
- New speed: 60 * 1.5 = 90 km/h
- Time for 180km: 180 / 90 = 2 hours
- Answer: 2 hours.
"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{"role": "user", "content": f"{examples}\nQ: {question}\nA: Let me think step by step."}
],
)
return response.content[0].text
Tree-of-Thought (ToT)
Tree-of-Thought extends CoT by exploring multiple reasoning paths simultaneously, evaluating each path, and selecting the most promising one. This is particularly useful for problems where the first approach might not be optimal — like puzzle-solving, creative tasks, or strategic planning.
// Tree-of-Thought implementation
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
interface ThoughtNode {
thought: string;
score: number;
children: ThoughtNode[];
}
async function treeOfThought(
problem: string,
breadth: number = 3,
depth: number = 3
): Promise<string> {
// Step 1: Generate multiple initial approaches
const approaches = await generateThoughts(problem, breadth);
// Step 2: Evaluate and score each approach
const scoredApproaches = await evaluateThoughts(problem, approaches);
// Step 3: Expand the top approaches
const topApproaches = scoredApproaches
.sort((a, b) => b.score - a.score)
.slice(0, 2);
let bestSolution = "";
let bestScore = 0;
for (const approach of topApproaches) {
// Recursively expand promising branches
const solution = await expandThought(problem, approach, depth - 1);
if (solution.score > bestScore) {
bestScore = solution.score;
bestSolution = solution.thought;
}
}
return bestSolution;
}
async function generateThoughts(
problem: string,
n: number
): Promise<string[]> {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [
{
role: "user",
content: `Problem: ${problem}
Generate exactly ${n} different approaches to solve this problem.
Format each as: APPROACH N: [description]
Be creative — each approach should be meaningfully different.`,
},
],
});
const text = response.content[0].type === "text" ? response.content[0].text : "";
return text.split(/APPROACH \d+:/).filter(Boolean).map(s => s.trim());
}
async function evaluateThoughts(
problem: string,
thoughts: string[]
): Promise<ThoughtNode[]> {
const scored: ThoughtNode[] = [];
for (const thought of thoughts) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 256,
messages: [
{
role: "user",
content: `Problem: ${problem}
Approach: ${thought}
Rate this approach from 1-10 for feasibility, correctness, and completeness.
Respond with just the number.`,
},
],
});
const scoreText = response.content[0].type === "text" ? response.content[0].text : "5";
const score = parseInt(scoreText.trim()) || 5;
scored.push({ thought, score, children: [] });
}
return scored;
}
# Tree-of-Thought in Python
import anthropic
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class ThoughtNode:
thought: str
score: float
children: list
def tree_of_thought(problem: str, breadth: int = 3, depth: int = 3) -> str:
# Generate multiple approaches
approaches = generate_thoughts(problem, breadth)
# Evaluate each approach
scored = evaluate_thoughts(problem, approaches)
# Expand the best approaches
scored.sort(key=lambda x: x.score, reverse=True)
top_approaches = scored[:2]
best_solution = ""
best_score = 0
for approach in top_approaches:
solution = expand_thought(problem, approach, depth - 1)
if solution.score > best_score:
best_score = solution.score
best_solution = solution.thought
return best_solution
def generate_thoughts(problem: str, n: int) -> list[str]:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Problem: {problem}
Generate {n} different approaches to solve this. Format: APPROACH N: [description]"""
}],
)
text = response.content[0].text
return [s.strip() for s in text.split("APPROACH")[1:] if s.strip()]
def evaluate_thoughts(problem: str, thoughts: list[str]) -> list[ThoughtNode]:
scored = []
for thought in thoughts:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{
"role": "user",
"content": f"Problem: {problem}\nApproach: {thought}\nRate 1-10. Just the number."
}],
)
score = int(response.content[0].text.strip())
scored.append(ThoughtNode(thought=thought, score=score, children=[]))
return scored
Self-Reflection
Self-reflection enables agents to evaluate their own outputs, identify errors, and improve iteratively. The agent acts as both the generator and the critic, creating a feedback loop that converges on better solutions.
// Self-reflection pattern
async function reflectiveAgent(
task: string,
maxReflections: number = 3
): Promise<string> {
let currentSolution = "";
// Initial attempt
const initialResponse = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [{ role: "user", content: task }],
});
currentSolution = initialResponse.content[0].type === "text"
? initialResponse.content[0].text : "";
// Reflection loop
for (let i = 0; i < maxReflections; i++) {
// Critique the current solution
const critique = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: `Task: ${task}
Current solution:
${currentSolution}
Critically evaluate this solution. Identify:
1. Errors or inaccuracies
2. Missing information
3. Areas for improvement
4. Overall quality score (1-10)
If the score is 8 or above, respond with "APPROVED".
Otherwise, provide specific feedback for improvement.`,
},
],
});
const critiqueText = critique.content[0].type === "text"
? critique.content[0].text : "";
if (critiqueText.includes("APPROVED")) {
return currentSolution;
}
// Revise based on feedback
const revision = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [
{
role: "user",
content: `Task: ${task}
Previous solution:
${currentSolution}
Feedback:
${critiqueText}
Please revise the solution addressing all the feedback.`,
},
],
});
currentSolution = revision.content[0].type === "text"
? revision.content[0].text : "";
}
return currentSolution;
}
Iterative Re-Planning
Real-world tasks often don't go as planned. Iterative re-planning allows agents to adjust their approach based on what they learn during execution. The agent creates an initial plan, executes steps, and periodically re-evaluates and updates the plan.
# Iterative re-planning agent
import anthropic
import json
client = anthropic.Anthropic()
def replan_agent(task: str, max_replans: int = 3) -> str:
# Create initial plan
plan = create_plan(task)
results = []
replans = 0
while plan and replans <= max_replans:
step = plan.pop(0)
# Execute the current step
result = execute_step(step, results)
results.append({"step": step, "result": result})
# Check if re-planning is needed
if should_replan(task, plan, results):
plan = create_updated_plan(task, results, plan)
replans += 1
# Synthesize final answer
return synthesize_results(task, results)
def create_plan(task: str) -> list[str]:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Create a step-by-step plan for: {task}\nReturn as JSON array of strings."
}],
)
return json.loads(response.content[0].text)
def should_replan(task: str, remaining_plan: list, results: list) -> bool:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Task: {task}
Completed: {json.dumps(results[-1] if results else {})}
Remaining plan: {json.dumps(remaining_plan)}
Should we adjust the plan? Answer YES or NO with brief reason."""
}],
)
return "YES" in response.content[0].text.upper()
Planning Best Practices
- Use CoT for all reasoning tasks: Even a simple "think step by step" significantly improves accuracy
- Reserve ToT for high-stakes decisions: The parallel exploration is expensive — use it where quality matters most
- Limit reflection rounds: Diminishing returns after 2-3 rounds; set a quality threshold for early termination
- Re-plan sparingly: Only re-plan when execution results materially differ from expectations
- Combine techniques: Use CoT within a ReAct loop, add reflection for final answers, re-plan when tools fail
Summary
Planning and reasoning techniques transform simple LLM calls into sophisticated problem-solving systems. Chain-of-Thought provides the foundation, Tree-of-Thought adds exploration breadth, self-reflection adds quality assurance, and re-planning adds adaptability. The best agents combine these techniques judiciously, using the simplest approach that produces reliable results.