Back to library

🗂️Choose a Vector Database

Build a five-axis scorecard — scale, hybrid search, filtering, ops, cost — that turns vector database selection from hype-driven guesswork into a defensible choice your future self will thank you for.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1Mapping the Vector Database Landscape

Map what vector databases really are and aren't

4 drops
  1. A vector DB is a special-purpose index, not a database

    6 min

    Vector databases are ANN indexes wearing a database costume. The 'database' part is the thinnest layer.

  2. pgvector might be enough — and that scares vendors

    6 min

    Postgres + pgvector handles tens of millions of vectors comfortably. The migration to a 'real' vector DB happens later than you think.

  3. Recall is the metric vendors hide and you ignore

    7 min

    ANN search is approximate by design. Recall@k tells you what fraction of true nearest neighbors you're actually returning — and tuning it is the real performance lever.

  4. Hybrid search is the feature that quietly decides quality

    7 min

    Pure vector search loses to hybrid (vector + keyword) on most real-world RAG workloads. Whether your DB does this natively is a top-three selection criterion.

Phase 2Scoring the Big Four on Five Axes

Score Pinecone, Weaviate, Qdrant, and Chroma head-to-head

5 drops
  1. Build the five-axis scorecard you'll use for every choice

    7 min

    Scale, hybrid search, filtering, ops model, and cost. Five axes, weighted by your workload, scored 1-5. That's the entire framework.

  2. Score Pinecone — the hosted-only premium option

    7 min

    Pinecone optimizes for 'no ops, scales high, costs more.' That's the pitch and the limitation in one sentence.

  3. Score Weaviate — the schema-first hybrid champion

    7 min

    Weaviate treats your data as objects with schemas, not vectors with metadata. That's a feature when filtering matters and a tax when it doesn't.

  4. Score Qdrant — the performance-obsessed Rust option

    7 min

    Qdrant is the closest thing to 'pgvector but better and dedicated.' Self-host friendly, fast, with the cleanest filtering story of the four.

  5. Score Chroma — the prototyping default that grew up

    6 min

    Chroma optimizes for developer experience first. That's perfect for week-one prototypes and risky for year-two production.

Phase 3Spotting the Migration Triggers

Spot the migration triggers before they hit you

4 drops
  1. The filter that killed our query latency at 8M vectors

    8 min

    Post-filter scaling is the migration trigger most teams hit first — and the one vendor benchmarks hide.

  2. RAG quality plateaued — and embeddings weren't the problem

    8 min

    When your eval scores stop improving and the team starts blaming the embedding model, audit your retrieval mode first. Pure vector search hits a quality ceiling that hybrid search breaks through.

  3. We hit 50M vectors and our hosted bill made the CFO call

    8 min

    Cost trajectory is a year-2 problem you make in week 1. The migration trigger is rarely scale itself — it's the bill at scale.

  4. Embeddings don't migrate — and that quietly changes everything

    8 min

    Migrating between vector DBs is mostly painless. Migrating between embedding models means re-embedding your entire corpus. The lock-in you actually have is to your model, not your DB.

Phase 4Defending the 50M-Document Choice

Pick a vector DB for 50M docs and defend it

1 drop
  1. Write the decision memo for a 50M-document RAG app

    25 min

    The deliverable that proves you understand vector DB selection is a one-page memo that survives a design-review cross-examination.

Frequently asked questions

Is pgvector good enough for production RAG?
This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What's the real difference between Pinecone, Weaviate, Qdrant, and Chroma?
This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When does metadata filtering force you off a vector database?
This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How much does a vector database actually cost at 50M documents?
This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What is hybrid search and why does it matter for RAG quality?
This is covered in the “Choose a Vector Database” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.