What is Cost and Performance Optimization?

Understand AI API pricing, optimize token usage, implement caching strategies, and make informed decisions about when to use expensive vs cheap models

Cost and Performance Optimization - AI-Native Engineering Tutorial | TechLead

Understanding the Cost of AI-Native Development

AI tools are not free. Between API calls, subscriptions, and compute, the costs can add up quickly — especially for teams. Understanding the cost structure helps you make smart decisions about when to use expensive models, when to use cheap ones, and when to use no AI at all.

Cost Components

Input Tokens: What you send to the model (prompts, code context, file contents). Typically cheaper than output.
Output Tokens: What the model generates (code, explanations, plans). The more it writes, the more it costs.
Tool Subscriptions: Monthly fees for Claude Code Pro, Cursor Pro, GitHub Copilot.
Compute (Local): GPU costs if running local models via Ollama or vLLM.
Infrastructure: Costs for MCP servers, vector databases, and custom AI tools.

Token Pricing Comparison

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Best For
Claude Opus	$15.00	$75.00	200K	Complex reasoning, architecture
Claude Sonnet	$3.00	$15.00	200K	Daily coding workhorse
Claude Haiku	$0.25	$1.25	200K	Quick tasks, high volume
GPT-4o	$2.50	$10.00	128K	General purpose
GPT-4o-mini	$0.15	$0.60	128K	Cheap, fast tasks
Local (Ollama)	$0 (GPU cost)	$0 (GPU cost)	Varies	Privacy, no per-token cost

Prices as of early 2026 and subject to change. Check provider pricing pages for current rates.

Cost Calculation Examples

// Estimating costs for common tasks

// A typical Claude Code session:
// - Claude reads ~20 files (avg 200 lines each) = ~80K input tokens
// - You send 5 prompts (avg 200 tokens each) = ~1K input tokens
// - Claude generates code + explanations = ~10K output tokens
// Total: ~81K input + 10K output

// With Claude Sonnet pricing:
// Input: 81K * ($3.00/1M) = $0.24
// Output: 10K * ($15.00/1M) = $0.15
// Total per session: ~$0.39

// A full day of AI-native development (5-8 sessions):
// Daily cost: ~$2-3 per developer

// Monthly per developer: ~$50-75 in API costs
// Plus subscription: Claude Code Pro $20/month or Cursor Pro $20/month
// Total: ~$70-95/month per developer

// ROI: If AI saves 2+ hours per day at $75/hour loaded cost,
// that is $150+/day savings vs $3/day cost = 50x ROI

Optimization Techniques

1. Optimize Prompt Length

# Be concise — every token costs money

# EXPENSIVE (verbose prompt)
> I would like you to please look at the function called
  calculateTotalPrice which is located in the file called
  pricing.ts in the app/lib directory. This function seems
  to have a bug where it does not correctly handle the case
  where the discount percentage is greater than 100%...

# CHEAP (same information, fewer tokens)
> Bug in app/lib/pricing.ts calculateTotalPrice: does not
  handle discount > 100%. Fix it and add validation.

2. Use the Right Model for the Task

// Model routing for cost optimization
function selectModel(task: TaskType): string {
  switch (task) {
    // Cheap tasks: use Haiku or GPT-4o-mini
    case "commit_message":
    case "simple_explanation":
    case "format_code":
    case "rename_variable":
      return "claude-haiku";

    // Standard tasks: use Sonnet
    case "implement_feature":
    case "write_tests":
    case "review_code":
    case "refactor":
      return "claude-sonnet";

    // Complex tasks: use Opus (sparingly)
    case "architect_system":
    case "debug_complex_issue":
    case "design_database_schema":
      return "claude-opus";
  }
}

3. Caching AI Responses

// Cache common AI operations to avoid redundant API calls
import { createHash } from "crypto";

const cache = new Map<string, { result: string; timestamp: number }>();
const CACHE_TTL = 3600000; // 1 hour

async function cachedAICall(
  prompt: string,
  model: string = "claude-sonnet-4-20250514"
): Promise<string> {
  const cacheKey = createHash("md5").update(prompt + model).digest("hex");

  const cached = cache.get(cacheKey);
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.result;
  }

  const result = await callAI(prompt, model);
  cache.set(cacheKey, { result, timestamp: Date.now() });
  return result;
}

// Use cases for caching:
// - Repeated code explanations (same code = same explanation)
// - Documentation generation (only regenerate when code changes)
// - Commit message patterns (similar diffs = similar messages)

Monitoring AI Usage

# Track your AI spending:

# 1. Anthropic Console — shows API usage and costs by day
# https://console.anthropic.com/usage

# 2. Set up billing alerts
# Configure alerts at $100/day or your threshold

# 3. Per-team tracking (for organizations)
# Use different API keys per team
# Track costs by team to identify optimization opportunities

# 4. Cost per PR/feature (advanced)
# Log API calls with metadata (PR number, feature name)
# Calculate AI cost per shipped feature
# Compare to time saved — validate ROI

The /compact Command Saves Money

Long Claude Code sessions accumulate context that gets sent with every subsequent message. A 50K-token conversation history means you are paying for 50K input tokens on every single follow-up message. Use /compact regularly to summarize the conversation and reduce context size. This can cut session costs by 50% or more.

The Cost Perspective

At $50-100/month per developer in AI costs, the ROI is almost always overwhelming. A developer earning $150K-250K costs the company $200-350/hour when you include benefits, overhead, and office costs. If AI saves 1-2 hours per day, that is $200-700/day in recovered productivity vs $3-5/day in AI costs. Do not optimize AI costs at the expense of developer productivity. Optimize for developer output first, then optimize costs within that constraint.

Summary

AI-native development costs $50-100/month per developer in API and subscription costs, delivering 50x+ ROI through productivity gains. Optimize by using the right model for each task (cheap models for simple tasks, expensive models for complex reasoning), keeping prompts concise, caching repeated operations, and using /compact to manage context size. Monitor spending through your provider's dashboard and set alerts. Never sacrifice productivity to save a few dollars in API costs.

Cost and Performance Optimization

Understanding the Cost of AI-Native Development

Cost Components

Token Pricing Comparison

Cost Calculation Examples

Optimization Techniques

1. Optimize Prompt Length

2. Use the Right Model for the Task

3. Caching AI Responses

Monitoring AI Usage

The /compact Command Saves Money

The Cost Perspective

Summary

Continue Learning

AI Agents & RAG

AI Fundamentals

Prompt Engineering

LangChain

DevTools & Productivity