Why Agents Need Memory
Without memory, every interaction with an AI agent starts from zero. The agent has no recollection of previous conversations, past decisions, or learned preferences. Memory systems give agents the ability to retain information across interactions, build up knowledge over time, and personalize their behavior based on history.
Memory is what separates a stateless chatbot from a true assistant. A customer support agent that remembers your previous issues, a coding agent that learns your project conventions, or a research agent that builds on previous findings — all require well-designed memory systems.
Types of Agent Memory
- Short-Term (Working) Memory: The current conversation context and recent tool results
- Long-Term Memory: Persistent knowledge stored in vector databases or key-value stores
- Episodic Memory: Specific past interactions and their outcomes
- Semantic Memory: General knowledge and facts about the world or domain
- Procedural Memory: Learned procedures, workflows, and successful strategies
Short-Term Memory
Short-term memory is the simplest form — it's the conversation history (message array) that you pass to the LLM. The challenge is that LLMs have finite context windows (typically 128K-200K tokens). When conversations exceed this limit, you need strategies to manage the window.
// Short-term memory with sliding window and summarization
interface Message {
role: "user" | "assistant" | "system";
content: string;
}
class ShortTermMemory {
private messages: Message[] = [];
private maxMessages: number;
private maxTokens: number;
constructor(maxMessages = 50, maxTokens = 100000) {
this.maxMessages = maxMessages;
this.maxTokens = maxTokens;
}
add(message: Message): void {
this.messages.push(message);
// Sliding window: remove oldest messages if over limit
if (this.messages.length > this.maxMessages) {
this.messages = this.messages.slice(-this.maxMessages);
}
}
getMessages(): Message[] {
return [...this.messages];
}
// Summarize older messages to compress context
async summarize(client: any): Promise<void> {
if (this.messages.length < 20) return;
const oldMessages = this.messages.slice(0, -10);
const recentMessages = this.messages.slice(-10);
const summary = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 500,
messages: [
{
role: "user",
content: `Summarize this conversation concisely, preserving key facts and decisions:\n${
oldMessages.map(m => `${m.role}: ${m.content}`).join("\n")
}`,
},
],
});
const summaryText = summary.content[0].type === "text" ? summary.content[0].text : "";
this.messages = [
{ role: "system", content: `Previous conversation summary: ${summaryText}` },
...recentMessages,
];
}
clear(): void {
this.messages = [];
}
}
Long-Term Memory with Vector Stores
Long-term memory persists across sessions. The most common approach is to embed and store important information in a vector database, then retrieve relevant memories using semantic search when needed.
# Long-term memory with ChromaDB
import chromadb
from sentence_transformers import SentenceTransformer
import json
from datetime import datetime
class LongTermMemory:
def __init__(self, collection_name: str = "agent_memory"):
self.client = chromadb.PersistentClient(path="./memory_db")
self.collection = self.client.get_or_create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"},
)
self.encoder = SentenceTransformer("all-MiniLM-L6-v2")
def store(self, content: str, metadata: dict = None) -> str:
"""Store a memory with optional metadata."""
memory_id = f"mem_{datetime.now().timestamp()}"
embedding = self.encoder.encode(content).tolist()
self.collection.add(
ids=[memory_id],
embeddings=[embedding],
documents=[content],
metadatas=[{
"timestamp": datetime.now().isoformat(),
"type": metadata.get("type", "general") if metadata else "general",
**(metadata or {}),
}],
)
return memory_id
def recall(self, query: str, n_results: int = 5, filter_type: str = None) -> list:
"""Retrieve relevant memories."""
query_embedding = self.encoder.encode(query).tolist()
where_filter = {"type": filter_type} if filter_type else None
results = self.collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
where=where_filter,
)
return [
{
"content": doc,
"metadata": meta,
"relevance": 1 - dist, # Convert distance to similarity
}
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0],
)
]
def forget(self, memory_id: str) -> None:
"""Delete a specific memory."""
self.collection.delete(ids=[memory_id])
# Usage
memory = LongTermMemory()
# Store memories
memory.store(
"User prefers Python over JavaScript for backend work",
{"type": "preference"}
)
memory.store(
"User's project uses PostgreSQL with SQLAlchemy ORM",
{"type": "project_context"}
)
memory.store(
"User had a bug with async database connections on 2026-03-15",
{"type": "episode"}
)
# Recall relevant memories
relevant = memory.recall("database connection issues")
for mem in relevant:
print(f"[{mem['relevance']:.2f}] {mem['content']}")
Episodic Memory
Episodic memory stores specific past interactions and their outcomes. This allows agents to learn from experience — remembering what worked, what failed, and why. It's particularly useful for agents that perform recurring tasks.
// Episodic memory system
import { ChromaClient, Collection } from "chromadb";
interface Episode {
id: string;
task: string;
actions: string[];
outcome: "success" | "failure" | "partial";
learnings: string;
timestamp: Date;
}
class EpisodicMemory {
private episodes: Episode[] = [];
private collection: Collection | null = null;
async init(): Promise<void> {
const client = new ChromaClient();
this.collection = await client.getOrCreateCollection({
name: "episodes",
});
}
async recordEpisode(episode: Omit<Episode, "id" | "timestamp">): Promise<string> {
const id = `ep_${Date.now()}`;
const fullEpisode: Episode = {
...episode,
id,
timestamp: new Date(),
};
const document = `Task: ${episode.task}\nActions: ${episode.actions.join(" -> ")}\nOutcome: ${episode.outcome}\nLearnings: ${episode.learnings}`;
await this.collection?.add({
ids: [id],
documents: [document],
metadatas: [{
outcome: episode.outcome,
timestamp: fullEpisode.timestamp.toISOString(),
}],
});
this.episodes.push(fullEpisode);
return id;
}
async recallSimilarEpisodes(
currentTask: string,
nResults = 3
): Promise<{ document: string; outcome: string }[]> {
if (!this.collection) return [];
const results = await this.collection.query({
queryTexts: [currentTask],
nResults,
});
return (results.documents?.[0] || []).map((doc, i) => ({
document: doc || "",
outcome: (results.metadatas?.[0]?.[i] as any)?.outcome || "unknown",
}));
}
}
// Integrate episodic memory into agent
async function agentWithEpisodicMemory(task: string): Promise<string> {
const memory = new EpisodicMemory();
await memory.init();
// Recall similar past experiences
const pastEpisodes = await memory.recallSimilarEpisodes(task);
const context = pastEpisodes.length > 0
? `\n\nRelevant past experiences:\n${pastEpisodes.map(e =>
`- ${e.document} (Outcome: ${e.outcome})`
).join("\n")}`
: "";
// Use past experience to inform current task
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
system: `You are an agent that learns from experience.${context}`,
messages: [{ role: "user", content: task }],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}
Memory with LangGraph Persistence
# LangGraph with built-in memory persistence
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(model="claude-sonnet-4-20250514")
def chat_node(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
# Build graph with checkpointing
graph = StateGraph(MessagesState)
graph.add_node("chat", chat_node)
graph.add_edge(START, "chat")
graph.add_edge("chat", END)
# Add memory persistence
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)
# Thread-based conversations persist automatically
config = {"configurable": {"thread_id": "user-123"}}
# First interaction
result1 = app.invoke(
{"messages": [{"role": "user", "content": "My name is Alice and I work on project Phoenix"}]},
config=config,
)
# Second interaction - agent remembers!
result2 = app.invoke(
{"messages": [{"role": "user", "content": "What project am I working on?"}]},
config=config,
)
# Agent responds: "You're working on project Phoenix"
Memory Design Considerations
- Privacy: Memory systems store user data — ensure compliance with GDPR, CCPA, and other regulations
- Decay: Not all memories should persist forever. Implement TTL or relevance-based cleanup
- Capacity: Vector databases grow with usage. Monitor storage costs and set retention policies
- Accuracy: Memories can contain errors. Add mechanisms to update or correct stored information
- Retrieval Quality: Poor retrieval degrades agent performance. Tune embedding models and similarity thresholds
Summary
Memory transforms agents from stateless tools into intelligent assistants that improve over time. Short-term memory manages the current conversation, long-term memory provides persistent knowledge, and episodic memory enables learning from experience. The right combination of memory types depends on your use case — start with short-term memory and add persistence as the application demands it.