🗂️Choose a Vector Database
Build a five-axis scorecard — scale, hybrid search, filtering, ops, cost — that turns vector database selection from hype-driven guesswork into a defensible choice your future self will thank you for.
Phase 1Mapping the Vector Database Landscape
Map what vector databases really are and aren't
A vector DB is a special-purpose index, not a database
6 minVector databases are ANN indexes wearing a database costume. The 'database' part is the thinnest layer.
pgvector might be enough — and that scares vendors
6 minPostgres + pgvector handles tens of millions of vectors comfortably. The migration to a 'real' vector DB happens later than you think.
Recall is the metric vendors hide and you ignore
7 minANN search is approximate by design. Recall@k tells you what fraction of true nearest neighbors you're actually returning — and tuning it is the real performance lever.
Hybrid search is the feature that quietly decides quality
7 minPure vector search loses to hybrid (vector + keyword) on most real-world RAG workloads. Whether your DB does this natively is a top-three selection criterion.
Phase 2Scoring the Big Four on Five Axes
Score Pinecone, Weaviate, Qdrant, and Chroma head-to-head
Build the five-axis scorecard you'll use for every choice
7 minScale, hybrid search, filtering, ops model, and cost. Five axes, weighted by your workload, scored 1-5. That's the entire framework.
Score Pinecone — the hosted-only premium option
7 minPinecone optimizes for 'no ops, scales high, costs more.' That's the pitch and the limitation in one sentence.
Score Weaviate — the schema-first hybrid champion
7 minWeaviate treats your data as objects with schemas, not vectors with metadata. That's a feature when filtering matters and a tax when it doesn't.
Score Qdrant — the performance-obsessed Rust option
7 minQdrant is the closest thing to 'pgvector but better and dedicated.' Self-host friendly, fast, with the cleanest filtering story of the four.
Score Chroma — the prototyping default that grew up
6 minChroma optimizes for developer experience first. That's perfect for week-one prototypes and risky for year-two production.
Phase 3Spotting the Migration Triggers
Spot the migration triggers before they hit you
The filter that killed our query latency at 8M vectors
8 minPost-filter scaling is the migration trigger most teams hit first — and the one vendor benchmarks hide.
RAG quality plateaued — and embeddings weren't the problem
8 minWhen your eval scores stop improving and the team starts blaming the embedding model, audit your retrieval mode first. Pure vector search hits a quality ceiling that hybrid search breaks through.
We hit 50M vectors and our hosted bill made the CFO call
8 minCost trajectory is a year-2 problem you make in week 1. The migration trigger is rarely scale itself — it's the bill at scale.
Embeddings don't migrate — and that quietly changes everything
8 minMigrating between vector DBs is mostly painless. Migrating between embedding models means re-embedding your entire corpus. The lock-in you actually have is to your model, not your DB.
Phase 4Defending the 50M-Document Choice
Pick a vector DB for 50M docs and defend it
Write the decision memo for a 50M-document RAG app
25 minThe deliverable that proves you understand vector DB selection is a one-page memo that survives a design-review cross-examination.
Frequently asked questions
- Is pgvector good enough for production RAG?
- This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- What's the real difference between Pinecone, Weaviate, Qdrant, and Chroma?
- This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- When does metadata filtering force you off a vector database?
- This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How much does a vector database actually cost at 50M documents?
- This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- What is hybrid search and why does it matter for RAG quality?
- This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Related paths
🐍Python Decorators Introduction
Build one mental model for Python decorators that covers closures, argument passing, functools.wraps, and stacking — then ship a working caching or logging decorator from scratch in under 30 lines.
🦀Rust Lifetimes Explained
Stop reading `'a` as line noise and start reading it as scope arithmetic — one failing snippet at a time — until you can thread lifetimes through a small parser or iterator adapter without fighting the borrow checker.
☸️Kubernetes Core Concepts
Stop drowning in 30+ resource types. Build the mental model one primitive at a time -- pods, deployments, services, ingress, config -- then deploy a real app with rolling updates and health checks.
📈Big O Intuition
Stop treating Big O as math you memorized for an interview — build the intuition to spot O(n²) disasters, pick the right data structure without thinking, and rewrite a slow function from O(n²) to O(n) in under five minutes.