Back to library

🎲Understand Temperature, Top-P, and Sampling

See exactly what temperature and top-p do to a model's probability distribution, then justify the sampling settings for your real tasks instead of guessing. Stop tweaking knobs and start engineering output behavior.

Applied14 drops~2-week path Β· 5–8 min/daytechnology

Phase 1The Distribution Behind Every Token

See next-token prediction as a real probability distribution

4 drops
  1. The model never picks one word β€” it ranks all of them

    6 min

    The model never picks one word β€” it ranks all of them

  2. Logits are the model's raw vote β€” softmax is the ballot

    7 min

    Logits are the model's raw vote β€” softmax is the ballot

  3. Temperature 0 isn't deterministic β€” it's just greedy

    6 min

    Temperature 0 isn't deterministic β€” it's just greedy

  4. Every sampling parameter answers one question: which tail to trust

    7 min

    Every sampling parameter answers one question: which tail to trust

Phase 2Watching Softmax Bend Under Heat

Walk softmax through temperatures and watch curves flatten

5 drops
  1. Cold temperatures crush the tail and worship the peak

    6 min

    Cold temperatures crush the tail and worship the peak

  2. Temperature 0.7 keeps the shape but lets the tail breathe

    6 min

    Temperature 0.7 keeps the shape but lets the tail breathe

  3. High temperatures flatten the distribution into noise

    7 min

    High temperatures flatten the distribution into noise

  4. Top-k draws a fixed line β€” and that's both its strength and its flaw

    6 min

    Top-k draws a fixed line β€” and that's both its strength and its flaw

  5. Top-p adapts to confidence β€” keeps a few tokens or many

    7 min

    Top-p adapts to confidence β€” keeps a few tokens or many

Phase 3Picking the Right Sampler for the Job

Choose between top-k, top-p, and min-p deliberately

4 drops
  1. You're extracting an email β€” but the model returns three different ones across runs

    7 min

    You're extracting an email β€” but the model returns three different ones across runs

  2. You're brainstorming product names and getting the same five every time

    7 min

    You're brainstorming product names and getting the same five every time

  3. Your code generation is technically syntactic β€” and subtly wrong

    7 min

    Your code generation is technically syntactic β€” and subtly wrong

  4. Min-p cuts based on the peak β€” fixes top-p's edge cases

    7 min

    Min-p cuts based on the peak β€” fixes top-p's edge cases

Phase 4Defending Your Sampling Choices

Lock in sampling choices for three real tasks

1 drop
  1. Pick three real tasks and lock in defensible sampling configs

    20 min

    Pick three real tasks and lock in defensible sampling configs

Frequently asked questions

What is the difference between temperature and top-p in LLMs?
This is covered in the β€œUnderstand Temperature, Top-P, and Sampling” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Should I use temperature or top-p for creative writing?
This is covered in the β€œUnderstand Temperature, Top-P, and Sampling” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why does temperature 0 not always give deterministic output?
This is covered in the β€œUnderstand Temperature, Top-P, and Sampling” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What does top-p (nucleus sampling) actually do?
This is covered in the β€œUnderstand Temperature, Top-P, and Sampling” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should I use min-p instead of top-p?
This is covered in the β€œUnderstand Temperature, Top-P, and Sampling” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.