What is Agent Memory Systems?

Implement short-term, long-term, and episodic memory to build agents that remember and learn

Agent Memory Systems - AI Agents & RAG Tutorial | TechLead

Why Agents Need Memory

Without memory, every interaction with an AI agent starts from zero. The agent has no recollection of previous conversations, past decisions, or learned preferences. Memory systems give agents the ability to retain information across interactions, build up knowledge over time, and personalize their behavior based on history.

Memory is what separates a stateless chatbot from a true assistant. A customer support agent that remembers your previous issues, a coding agent that learns your project conventions, or a research agent that builds on previous findings — all require well-designed memory systems.

Types of Agent Memory

Short-Term (Working) Memory: The current conversation context and recent tool results
Long-Term Memory: Persistent knowledge stored in vector databases or key-value stores
Episodic Memory: Specific past interactions and their outcomes
Semantic Memory: General knowledge and facts about the world or domain
Procedural Memory: Learned procedures, workflows, and successful strategies

Short-Term Memory

Short-term memory is the simplest form — it's the conversation history (message array) that you pass to the LLM. The challenge is that LLMs have finite context windows (typically 128K-200K tokens). When conversations exceed this limit, you need strategies to manage the window.

// Short-term memory with sliding window and summarization
interface Message {
  role: "user" | "assistant" | "system";
  content: string;
}

class ShortTermMemory {
  private messages: Message[] = [];
  private maxMessages: number;
  private maxTokens: number;

  constructor(maxMessages = 50, maxTokens = 100000) {
    this.maxMessages = maxMessages;
    this.maxTokens = maxTokens;
  }

  add(message: Message): void {
    this.messages.push(message);

    // Sliding window: remove oldest messages if over limit
    if (this.messages.length > this.maxMessages) {
      this.messages = this.messages.slice(-this.maxMessages);
    }
  }

  getMessages(): Message[] {
    return [...this.messages];
  }

  // Summarize older messages to compress context
  async summarize(client: any): Promise<void> {
    if (this.messages.length < 20) return;

    const oldMessages = this.messages.slice(0, -10);
    const recentMessages = this.messages.slice(-10);

    const summary = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 500,
      messages: [
        {
          role: "user",
          content: `Summarize this conversation concisely, preserving key facts and decisions:\n${
            oldMessages.map(m => `${m.role}: ${m.content}`).join("\n")
          }`,
        },
      ],
    });

    const summaryText = summary.content[0].type === "text" ? summary.content[0].text : "";

    this.messages = [
      { role: "system", content: `Previous conversation summary: ${summaryText}` },
      ...recentMessages,
    ];
  }

  clear(): void {
    this.messages = [];
  }
}

Long-Term Memory with Vector Stores

Long-term memory persists across sessions. The most common approach is to embed and store important information in a vector database, then retrieve relevant memories using semantic search when needed.

# Long-term memory with ChromaDB
import chromadb
from sentence_transformers import SentenceTransformer
import json
from datetime import datetime

class LongTermMemory:
    def __init__(self, collection_name: str = "agent_memory"):
        self.client = chromadb.PersistentClient(path="./memory_db")
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            metadata={"hnsw:space": "cosine"},
        )
        self.encoder = SentenceTransformer("all-MiniLM-L6-v2")

    def store(self, content: str, metadata: dict = None) -> str:
        """Store a memory with optional metadata."""
        memory_id = f"mem_{datetime.now().timestamp()}"
        embedding = self.encoder.encode(content).tolist()

        self.collection.add(
            ids=[memory_id],
            embeddings=[embedding],
            documents=[content],
            metadatas=[{
                "timestamp": datetime.now().isoformat(),
                "type": metadata.get("type", "general") if metadata else "general",
                **(metadata or {}),
            }],
        )
        return memory_id

    def recall(self, query: str, n_results: int = 5, filter_type: str = None) -> list:
        """Retrieve relevant memories."""
        query_embedding = self.encoder.encode(query).tolist()

        where_filter = {"type": filter_type} if filter_type else None

        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results,
            where=where_filter,
        )

        return [
            {
                "content": doc,
                "metadata": meta,
                "relevance": 1 - dist,  # Convert distance to similarity
            }
            for doc, meta, dist in zip(
                results["documents"][0],
                results["metadatas"][0],
                results["distances"][0],
            )
        ]

    def forget(self, memory_id: str) -> None:
        """Delete a specific memory."""
        self.collection.delete(ids=[memory_id])

# Usage
memory = LongTermMemory()

# Store memories
memory.store(
    "User prefers Python over JavaScript for backend work",
    {"type": "preference"}
)
memory.store(
    "User's project uses PostgreSQL with SQLAlchemy ORM",
    {"type": "project_context"}
)
memory.store(
    "User had a bug with async database connections on 2026-03-15",
    {"type": "episode"}
)

# Recall relevant memories
relevant = memory.recall("database connection issues")
for mem in relevant:
    print(f"[{mem['relevance']:.2f}] {mem['content']}")

Episodic Memory

Episodic memory stores specific past interactions and their outcomes. This allows agents to learn from experience — remembering what worked, what failed, and why. It's particularly useful for agents that perform recurring tasks.

// Episodic memory system
import { ChromaClient, Collection } from "chromadb";

interface Episode {
  id: string;
  task: string;
  actions: string[];
  outcome: "success" | "failure" | "partial";
  learnings: string;
  timestamp: Date;
}

class EpisodicMemory {
  private episodes: Episode[] = [];
  private collection: Collection | null = null;

  async init(): Promise<void> {
    const client = new ChromaClient();
    this.collection = await client.getOrCreateCollection({
      name: "episodes",
    });
  }

  async recordEpisode(episode: Omit<Episode, "id" | "timestamp">): Promise<string> {
    const id = `ep_${Date.now()}`;
    const fullEpisode: Episode = {
      ...episode,
      id,
      timestamp: new Date(),
    };

    const document = `Task: ${episode.task}\nActions: ${episode.actions.join(" -> ")}\nOutcome: ${episode.outcome}\nLearnings: ${episode.learnings}`;

    await this.collection?.add({
      ids: [id],
      documents: [document],
      metadatas: [{
        outcome: episode.outcome,
        timestamp: fullEpisode.timestamp.toISOString(),
      }],
    });

    this.episodes.push(fullEpisode);
    return id;
  }

  async recallSimilarEpisodes(
    currentTask: string,
    nResults = 3
  ): Promise<{ document: string; outcome: string }[]> {
    if (!this.collection) return [];

    const results = await this.collection.query({
      queryTexts: [currentTask],
      nResults,
    });

    return (results.documents?.[0] || []).map((doc, i) => ({
      document: doc || "",
      outcome: (results.metadatas?.[0]?.[i] as any)?.outcome || "unknown",
    }));
  }
}

// Integrate episodic memory into agent
async function agentWithEpisodicMemory(task: string): Promise<string> {
  const memory = new EpisodicMemory();
  await memory.init();

  // Recall similar past experiences
  const pastEpisodes = await memory.recallSimilarEpisodes(task);

  const context = pastEpisodes.length > 0
    ? `\n\nRelevant past experiences:\n${pastEpisodes.map(e =>
        `- ${e.document} (Outcome: ${e.outcome})`
      ).join("\n")}`
    : "";

  // Use past experience to inform current task
  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    system: `You are an agent that learns from experience.${context}`,
    messages: [{ role: "user", content: task }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Memory with LangGraph Persistence

# LangGraph with built-in memory persistence
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(model="claude-sonnet-4-20250514")

def chat_node(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

# Build graph with checkpointing
graph = StateGraph(MessagesState)
graph.add_node("chat", chat_node)
graph.add_edge(START, "chat")
graph.add_edge("chat", END)

# Add memory persistence
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# Thread-based conversations persist automatically
config = {"configurable": {"thread_id": "user-123"}}

# First interaction
result1 = app.invoke(
    {"messages": [{"role": "user", "content": "My name is Alice and I work on project Phoenix"}]},
    config=config,
)

# Second interaction - agent remembers!
result2 = app.invoke(
    {"messages": [{"role": "user", "content": "What project am I working on?"}]},
    config=config,
)
# Agent responds: "You're working on project Phoenix"

Memory Design Considerations

Privacy: Memory systems store user data — ensure compliance with GDPR, CCPA, and other regulations
Decay: Not all memories should persist forever. Implement TTL or relevance-based cleanup
Capacity: Vector databases grow with usage. Monitor storage costs and set retention policies
Accuracy: Memories can contain errors. Add mechanisms to update or correct stored information
Retrieval Quality: Poor retrieval degrades agent performance. Tune embedding models and similarity thresholds

Summary

Memory transforms agents from stateless tools into intelligent assistants that improve over time. Short-term memory manages the current conversation, long-term memory provides persistent knowledge, and episodic memory enables learning from experience. The right combination of memory types depends on your use case — start with short-term memory and add persistence as the application demands it.

Agent Memory Systems

Why Agents Need Memory

Types of Agent Memory

Short-Term Memory

Long-Term Memory with Vector Stores

Episodic Memory

Memory with LangGraph Persistence

Memory Design Considerations

Summary

Continue Learning

AI-Native Engineering

AI & Machine Learning

LangChain

Python

Vercel AI SDK