What is Vector Databases Overview?

Compare Pinecone, Weaviate, ChromaDB, Qdrant, and pgvector for building RAG applications

Vector Databases Overview - AI Agents & RAG Tutorial | TechLead

Q: What Are Vector Databases?

Vector databases are specialized storage systems designed to efficiently store, index, and search high-dimensional vector embeddings. They are the backbone of RAG systems, enabling fast semantic search over millions or billions of document embeddings.

What Are Vector Databases?

Vector databases are specialized storage systems designed to efficiently store, index, and search high-dimensional vector embeddings. They are the backbone of RAG systems, enabling fast semantic search over millions or billions of document embeddings.

While you could store vectors in a regular database as arrays, vector databases provide specialized indexing algorithms (HNSW, IVF, etc.) that make similarity search orders of magnitude faster — from O(n) linear scan to approximate nearest neighbor (ANN) search in milliseconds.

Vector Database Comparison

Database	Type	Hosting	Free Tier	Best For
Pinecone	Managed	Cloud only	Yes (limited)	Production SaaS
Weaviate	Open-source	Cloud + Self-host	Yes	Hybrid search
ChromaDB	Open-source	Local + Cloud	Free (local)	Prototyping, local dev
Qdrant	Open-source	Cloud + Self-host	Yes	Advanced filtering
pgvector	Extension	Any PostgreSQL	Free	Existing PG stack

Pinecone

Pinecone is a fully managed vector database that requires zero infrastructure management. It's the most popular choice for production RAG applications.

// Pinecone quickstart
import { Pinecone } from "@pinecone-database/pinecone";

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });

// Create an index
await pc.createIndex({
  name: "my-rag-index",
  dimension: 1536,
  metric: "cosine",
  spec: { serverless: { cloud: "aws", region: "us-east-1" } },
});

const index = pc.index("my-rag-index");

// Upsert vectors
await index.upsert([
  {
    id: "doc-1",
    values: embedding1, // number[] of length 1536
    metadata: { source: "handbook.pdf", page: 5, text: "Company vacation policy..." },
  },
  {
    id: "doc-2",
    values: embedding2,
    metadata: { source: "handbook.pdf", page: 6, text: "Sick leave policy..." },
  },
]);

// Query
const results = await index.query({
  vector: queryEmbedding,
  topK: 5,
  includeMetadata: true,
  filter: { source: { $eq: "handbook.pdf" } },
});

results.matches?.forEach(match => {
  console.log(`[${match.score?.toFixed(3)}] ${match.metadata?.text}`);
});

ChromaDB

ChromaDB is an open-source embedding database optimized for developer experience. It runs locally with zero configuration and is perfect for development and small-to-medium applications.

# ChromaDB quickstart
import chromadb

# Persistent local storage
client = chromadb.PersistentClient(path="./chroma_data")

# Create a collection
collection = client.get_or_create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"},  # cosine, l2, or ip
)

# Add documents (ChromaDB can auto-embed with default model)
collection.add(
    ids=["doc1", "doc2", "doc3"],
    documents=[
        "The company offers 20 days of paid vacation per year.",
        "Remote work is available for all engineering roles.",
        "Health insurance covers dental and vision.",
    ],
    metadatas=[
        {"source": "hr_handbook", "section": "benefits"},
        {"source": "hr_handbook", "section": "remote_work"},
        {"source": "hr_handbook", "section": "insurance"},
    ],
)

# Query with auto-embedding
results = collection.query(
    query_texts=["How many vacation days do I get?"],
    n_results=3,
    where={"source": "hr_handbook"},  # Metadata filtering
)

for doc, meta, dist in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    print(f"[{1-dist:.3f}] ({meta['section']}) {doc}")

Qdrant

// Qdrant - advanced filtering and payload support
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ url: "http://localhost:6333" });

// Create collection
await client.createCollection("documents", {
  vectors: { size: 1536, distance: "Cosine" },
});

// Upsert with rich payload
await client.upsert("documents", {
  points: [
    {
      id: 1,
      vector: embedding1,
      payload: {
        text: "Vacation policy allows 20 days PTO",
        source: "handbook",
        department: "HR",
        updated_at: "2026-01-15",
      },
    },
  ],
});

// Query with complex filters
const results = await client.search("documents", {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      { key: "department", match: { value: "HR" } },
      { key: "updated_at", range: { gte: "2025-01-01" } },
    ],
  },
});

pgvector (PostgreSQL Extension)

# pgvector with SQLAlchemy
from sqlalchemy import create_engine, Column, Integer, String, Text
from sqlalchemy.orm import declarative_base, Session
from pgvector.sqlalchemy import Vector

engine = create_engine("postgresql://user:pass@localhost/mydb")
Base = declarative_base()

class Document(Base):
    __tablename__ = "documents"
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    source = Column(String)
    embedding = Column(Vector(1536))  # pgvector column

Base.metadata.create_all(engine)

# Insert documents with embeddings
with Session(engine) as session:
    doc = Document(
        content="Company vacation policy...",
        source="handbook.pdf",
        embedding=embedding_vector,  # list of 1536 floats
    )
    session.add(doc)
    session.commit()

# Semantic search using cosine distance
with Session(engine) as session:
    results = (
        session.query(Document)
        .order_by(Document.embedding.cosine_distance(query_vector))
        .limit(5)
        .all()
    )
    for doc in results:
        print(f"{doc.source}: {doc.content[:100]}...")

Choosing a Vector Database

Pinecone if you want zero ops and are building a production SaaS product
ChromaDB if you want the fastest path from zero to working prototype
Qdrant if you need advanced filtering and want to self-host with full control
Weaviate if you need built-in hybrid search (vector + keyword) and GraphQL API
pgvector if you already use PostgreSQL and want to avoid adding another database

Summary

Vector databases are essential infrastructure for RAG. Start with ChromaDB for development, then evaluate managed options like Pinecone for production, or self-hosted options like Qdrant if you need data sovereignty. If you already have PostgreSQL, pgvector is a pragmatic choice that avoids adding another database to your stack.

Vector Databases Overview

What Are Vector Databases?

Vector Database Comparison

Pinecone

ChromaDB

Qdrant

pgvector (PostgreSQL Extension)

Choosing a Vector Database

Summary

Continue Learning

AI-Native Engineering

AI & Machine Learning

LangChain

Python

Vercel AI SDK