Back to library

🧬Sentence vs Token Embeddings

Stop grabbing BERT's [CLS] token and calling it a sentence embedding. By the end you'll know exactly when token, pooled, and contrastively-trained vectors each win — and design a 100K-doc semantic search you can defend.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1What Each Vector Actually Represents

What token vs sentence embeddings actually represent

4 drops
  1. A token vector is a context-aware fragment, not a meaning

    6 min

    A token vector is a context-aware fragment, not a meaning

  2. What a sentence embedding actually has to do

    6 min

    What a sentence embedding actually has to do

  3. Why [CLS] looks like a sentence embedding but isn't

    7 min

    Why [CLS] looks like a sentence embedding but isn't

  4. Mean-pooling is better than [CLS] and still not enough

    7 min

    Mean-pooling is better than [CLS] and still not enough

Phase 2Three Embeddings on One Task

Compare [CLS], mean-pool, and sentence-transformers head-to-head

5 drops
  1. Pick a single task and lock the rest down

    6 min

    Pick a single task and lock the rest down

  2. Run [CLS], mean-pool, and SBERT head-to-head

    9 min

    Run [CLS], mean-pool, and SBERT head-to-head

  3. What contrastive training actually changes

    7 min

    What contrastive training actually changes

  4. Pooling tricks: mean, max, CLS, attention

    7 min

    Pooling tricks: mean, max, CLS, attention

  5. When token embeddings are still the right tool

    6 min

    When token embeddings are still the right tool

Phase 3Pipelines, Not Single Choices

Place bi-encoders, cross-encoders, and rerankers in a pipeline

3 drops
  1. Bi-encoders are the only embedding that scales

    7 min

    Bi-encoders are the only embedding that scales

  2. Cross-encoders are the only embedding that nuances

    7 min

    Cross-encoders are the only embedding that nuances

  3. Two-stage retrieve-and-rerank is the canonical shape

    7 min

    Two-stage retrieve-and-rerank is the canonical shape

Phase 4Design the 100K Search

Design a 100K-doc semantic search and defend it

2 drops
  1. Choose the bi-encoder for 100K documents

    7 min

    Choose the bi-encoder for 100K documents

  2. Design and defend a 100K-doc semantic search

    20 min

    Design and defend a 100K-doc semantic search

Frequently asked questions

What's the difference between a token embedding and a sentence embedding?
This is covered in the “Sentence vs Token Embeddings” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why is BERT's [CLS] token a bad sentence embedding out of the box?
This is covered in the “Sentence vs Token Embeddings” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should I use mean-pooled BERT vs a sentence-transformers model?
This is covered in the “Sentence vs Token Embeddings” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Do I need a cross-encoder reranker on top of bi-encoder retrieval?
This is covered in the “Sentence vs Token Embeddings” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I pick an embedding model for a 100K-document semantic search?
This is covered in the “Sentence vs Token Embeddings” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.