What is Embedding Models?

Compare OpenAI, Cohere, and open-source embedding models for semantic search and RAG applications

Embedding Models - AI Agents & RAG Tutorial | TechLead

Q: What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning. Similar texts produce similar vectors, enabling semantic search — finding relevant documents based on meaning rather than exact keyword matches. Embeddings are the engine that powers RAG retrieval.

What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning. Similar texts produce similar vectors, enabling semantic search — finding relevant documents based on meaning rather than exact keyword matches. Embeddings are the engine that powers RAG retrieval.

When you embed a question like "How do I reset my password?" and a document chunk containing "To change your account password, go to Settings...", the vectors will be close together in the embedding space despite sharing few exact words. This semantic understanding is what makes RAG far more powerful than traditional keyword search.

Key Embedding Concepts

Dimensions: The number of values in the vector (256 to 3072). Higher dimensions can capture more nuance but cost more to store
Similarity Metrics: Cosine similarity, dot product, or Euclidean distance to compare vectors
Context Window: Maximum tokens the model can embed at once (512 to 8192)
Normalization: Most models normalize vectors to unit length for consistent cosine similarity

Popular Embedding Models (2025-2026)

Model Comparison

Model	Dimensions	Max Tokens	Cost	MTEB Score
text-embedding-3-large	3072	8191	$0.13/M tokens	64.6
text-embedding-3-small	1536	8191	$0.02/M tokens	62.3
Cohere embed-v4	1024	512	$0.10/M tokens	66.3
Voyage AI voyage-3	1024	32000	$0.06/M tokens	67.1
all-MiniLM-L6-v2	384	512	Free (local)	56.3
BGE-large-en-v1.5	1024	512	Free (local)	64.2

Using OpenAI Embeddings

// OpenAI embeddings in TypeScript
import OpenAI from "openai";

const openai = new OpenAI();

// Single text embedding
async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    dimensions: 1536, // Can reduce dimensions for cost savings
  });
  return response.data[0].embedding;
}

// Batch embedding (more efficient)
async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts,
  });
  return response.data
    .sort((a, b) => a.index - b.index)
    .map(d => d.embedding);
}

// Cosine similarity between two vectors
function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

// Example: find similarity between texts
const query = await embedText("How do I deploy to production?");
const doc1 = await embedText("Production deployment guide using Docker and Kubernetes");
const doc2 = await embedText("Making chocolate chip cookies at home");

console.log("Relevant doc:", cosineSimilarity(query, doc1));  // ~0.82
console.log("Irrelevant doc:", cosineSimilarity(query, doc2)); // ~0.15

Using Open-Source Embeddings Locally

# Local embeddings with sentence-transformers
from sentence_transformers import SentenceTransformer
import numpy as np

# Load model (downloads on first use, ~90MB for MiniLM)
model = SentenceTransformer("all-MiniLM-L6-v2")

# Single embedding
text = "How do I reset my password?"
embedding = model.encode(text)
print(f"Shape: {embedding.shape}")  # (384,)

# Batch embedding
texts = [
    "Password reset instructions",
    "Account security settings",
    "Chocolate cake recipe",
]
embeddings = model.encode(texts)

# Compute similarities
query_embedding = model.encode("How do I change my password?")
similarities = np.dot(embeddings, query_embedding) / (
    np.linalg.norm(embeddings, axis=1) * np.linalg.norm(query_embedding)
)

for text, sim in zip(texts, similarities):
    print(f"{sim:.3f} - {text}")

# For better quality, use a larger model
model_large = SentenceTransformer("BAAI/bge-large-en-v1.5")
# Prefix queries with "Represent this sentence:" for BGE models
query_emb = model_large.encode("Represent this sentence: How to deploy?")


# Using Cohere embeddings
import cohere

co = cohere.Client()

response = co.embed(
    texts=["Hello world", "How are you?"],
    model="embed-english-v3.0",
    input_type="search_document",  # or "search_query" for queries
)

print(f"Dimensions: {len(response.embeddings[0])}")  # 1024

Choosing the Right Model

Decision Guide

Scenario	Recommended Model	Reason
General RAG, budget-friendly	text-embedding-3-small	Low cost, good quality, easy API
Highest quality, cost not an issue	Voyage AI voyage-3	Top MTEB scores, long context
Offline / privacy sensitive	BGE-large-en-v1.5	Local, no data leaves your infra
Prototype / development	all-MiniLM-L6-v2	Free, fast, good enough for testing
Multilingual	Cohere embed-multilingual	100+ languages supported

Embedding Best Practices

Match query and document models: Always use the same embedding model for both queries and documents
Batch your embeddings: API calls are more efficient in batches of 100-2000 texts
Cache embeddings: Don't re-embed unchanged documents — store and reuse vectors
Normalize vectors: Ensure all vectors are L2-normalized for consistent cosine similarity
Benchmark on your data: MTEB scores are general — test on your specific domain for best results

Summary

The embedding model you choose directly impacts RAG retrieval quality. OpenAI's text-embedding-3-small offers the best balance of cost and quality for most applications. For maximum quality, consider Voyage AI or Cohere. For privacy or offline use, open-source models like BGE-large work well. Always benchmark on your specific data and use case before committing to a model.

Embedding Models

What Are Embeddings?

Key Embedding Concepts

Popular Embedding Models (2025-2026)

Model Comparison

Using OpenAI Embeddings

Using Open-Source Embeddings Locally

Choosing the Right Model

Decision Guide

Embedding Best Practices

Summary

Continue Learning

AI-Native Engineering

AI & Machine Learning

LangChain

Python

Vercel AI SDK