TechLead
Lesson 12 of 24
6 min read
AI Agents & RAG

Pinecone Tutorial

Step-by-step guide to setting up and using Pinecone for production RAG applications

Getting Started with Pinecone

Pinecone is the leading managed vector database for production AI applications. It handles indexing, sharding, replication, and scaling automatically, so you can focus on building your RAG pipeline. In this tutorial, we'll build a complete document search system with Pinecone.

What You'll Build

  • Step 1: Set up a Pinecone account and create a serverless index
  • Step 2: Ingest and embed documents into the index
  • Step 3: Implement semantic search with metadata filtering
  • Step 4: Build a complete RAG query pipeline with Claude
  • Step 5: Add namespace management and batch operations

Setup and Installation

# TypeScript
npm install @pinecone-database/pinecone @anthropic-ai/sdk openai

# Python
pip install pinecone-client anthropic openai

Creating an Index

// Pinecone setup and index creation
import { Pinecone } from "@pinecone-database/pinecone";
import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const openai = new OpenAI();
const anthropic = new Anthropic();

// Create a serverless index
async function createIndex() {
  const indexName = "knowledge-base";

  // Check if index already exists
  const existingIndexes = await pc.listIndexes();
  const exists = existingIndexes.indexes?.some(i => i.name === indexName);

  if (!exists) {
    await pc.createIndex({
      name: indexName,
      dimension: 1536, // Match your embedding model's dimensions
      metric: "cosine",
      spec: {
        serverless: {
          cloud: "aws",
          region: "us-east-1",
        },
      },
    });

    // Wait for index to be ready
    console.log("Waiting for index to initialize...");
    await new Promise(resolve => setTimeout(resolve, 30000));
  }

  return pc.index(indexName);
}

const index = await createIndex();

Ingesting Documents

// Document ingestion pipeline
interface Document {
  id: string;
  text: string;
  metadata: Record<string, string | number>;
}

async function embedTexts(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts,
  });
  return response.data.map(d => d.embedding);
}

async function ingestDocuments(documents: Document[]): Promise<void> {
  const batchSize = 100;

  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
    const texts = batch.map(d => d.text);
    const embeddings = await embedTexts(texts);

    const vectors = batch.map((doc, j) => ({
      id: doc.id,
      values: embeddings[j],
      metadata: {
        ...doc.metadata,
        text: doc.text, // Store text in metadata for retrieval
      },
    }));

    await index.upsert(vectors);
    console.log(`Ingested batch ${Math.floor(i / batchSize) + 1}`);
  }
}

// Example: Ingest product documentation
await ingestDocuments([
  {
    id: "doc-001",
    text: "To reset your password, navigate to Settings > Security > Change Password. Enter your current password, then your new password twice.",
    metadata: { source: "help-center", category: "account", page: 1 },
  },
  {
    id: "doc-002",
    text: "Our premium plan includes unlimited API calls, priority support, and custom integrations. Pricing starts at $99/month for up to 10 team members.",
    metadata: { source: "pricing", category: "plans", page: 1 },
  },
  {
    id: "doc-003",
    text: "The API rate limit for free tier is 100 requests per minute. Premium users get 10,000 requests per minute with burst capacity up to 15,000.",
    metadata: { source: "api-docs", category: "limits", page: 3 },
  },
]);

Querying with RAG

// Complete RAG query pipeline
async function ragQuery(
  question: string,
  options?: {
    topK?: number;
    filter?: Record<string, any>;
    namespace?: string;
  }
): Promise<{ answer: string; sources: string[] }> {
  const { topK = 5, filter, namespace } = options || {};

  // Step 1: Embed the question
  const [queryEmbedding] = await embedTexts([question]);

  // Step 2: Search Pinecone
  const targetIndex = namespace ? index.namespace(namespace) : index;
  const searchResults = await targetIndex.query({
    vector: queryEmbedding,
    topK,
    includeMetadata: true,
    filter,
  });

  // Step 3: Build context from results
  const context = searchResults.matches
    ?.map((match, i) => {
      const meta = match.metadata as Record<string, any>;
      return `[Source ${i + 1}: ${meta.source || "unknown"} (score: ${match.score?.toFixed(3)})]\n${meta.text}`;
    })
    .join("\n\n") || "No relevant documents found.";

  const sources = searchResults.matches
    ?.map(m => (m.metadata as any)?.source || "unknown")
    .filter((v, i, a) => a.indexOf(v) === i) || [];

  // Step 4: Generate answer with Claude
  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    system: `You are a helpful assistant that answers questions based on the provided context.
Rules:
- Only use information from the context below
- If the context doesn't contain the answer, say so clearly
- Cite the source number [Source N] when referencing information
- Be concise but thorough

Context:
${context}`,
    messages: [{ role: "user", content: question }],
  });

  const answer = response.content[0].type === "text" ? response.content[0].text : "";
  return { answer, sources };
}

// Usage
const result = await ragQuery("How do I reset my password?");
console.log(result.answer);
console.log("Sources:", result.sources);

// With filtering
const apiResult = await ragQuery("What are the API rate limits?", {
  filter: { category: { $eq: "limits" } },
});

Python Implementation

# Complete Pinecone RAG in Python
from pinecone import Pinecone, ServerlessSpec
from openai import OpenAI
import anthropic

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
openai_client = OpenAI()
anthropic_client = anthropic.Anthropic()

# Create index
if "knowledge-base" not in [idx.name for idx in pc.list_indexes()]:
    pc.create_index(
        name="knowledge-base",
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index("knowledge-base")

def embed_texts(texts: list[str]) -> list[list[float]]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )
    return [d.embedding for d in response.data]

def ingest(documents: list[dict]) -> None:
    batch_size = 100
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        embeddings = embed_texts([d["text"] for d in batch])
        vectors = [
            {
                "id": d["id"],
                "values": emb,
                "metadata": {**d.get("metadata", {}), "text": d["text"]},
            }
            for d, emb in zip(batch, embeddings)
        ]
        index.upsert(vectors=vectors)

def rag_query(question: str, top_k: int = 5) -> dict:
    query_embedding = embed_texts([question])[0]

    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True,
    )

    context = "\n\n".join(
        f"[Source {i+1}] {m.metadata.get('text', '')}"
        for i, m in enumerate(results.matches)
    )

    response = anthropic_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=f"Answer based on this context:\n{context}",
        messages=[{"role": "user", "content": question}],
    )

    return {
        "answer": response.content[0].text,
        "sources": list(set(m.metadata.get("source") for m in results.matches)),
    }

Pinecone Tips

  • Use namespaces: Separate data by tenant, environment, or document type within a single index
  • Store text in metadata: Pinecone doesn't store original text by default — include it in metadata for retrieval
  • Metadata size limit: Each vector's metadata is limited to 40KB. For long documents, store text externally
  • Batch upserts: Always upsert in batches of 100 for optimal throughput
  • Monitor usage: Track query latency and index fullness in the Pinecone dashboard

Summary

Pinecone provides the simplest path to production-grade vector search. Its serverless architecture means you don't need to manage infrastructure, and the API is straightforward. The key steps are: create an index, embed and upsert your documents, then query with embedded questions. Combined with Claude for generation, you have a complete RAG system ready for production.

Continue Learning