Back to library

🪟Understand Context Windows in LLMs

See past the 'context length exceeded' error and pick the right fix every time — trim, summarize, retrieve, or upgrade. By the end you can sketch a memory strategy for a chatbot answering from a 500-page handbook without guessing.

Foundations14 drops~2-week path · 5–8 min/daytechnology

Phase 1What the Window Actually Is

See the window as a token budget per call

4 drops
  1. The model has no memory between calls

    6 min

    Each LLM call is a fresh forward pass that only sees what you put in the request — no memory persists between calls.

  2. Tokens aren't words, and the math matters

    6 min

    Context windows are measured in tokens, which are sub-word fragments — and your rough word count almost always under-estimates the real cost.

  3. Context, conversation, retrieval — three different things

    7 min

    Most 'memory' problems are actually one of three different layers — the context window, the conversation log, or retrieval — and the fix depends on which layer is broken.

  4. Bigger windows aren't free — and aren't always better

    7 min

    Larger context windows cost more, run slower, and degrade in quality past a certain point — so 'just use the 1M model' is rarely the right answer.

Phase 2Measure What's in the Window

Count tokens in a real call and predict overflow

5 drops
  1. Count before you cut

    7 min

    You can't budget what you don't measure — every context strategy starts with knowing the actual token cost of every piece of your prompt.

  2. Use the model's actual tokenizer, not a heuristic

    6 min

    Char-count and word-count heuristics are off by 10–30% — fine for back-of-envelope, dangerous for production budgeting.

  3. Conversation history grows linearly until it doesn't

    7 min

    Every turn in a chat appends roughly the user's question plus the model's answer to the next call — and model answers are often the longest part.

  4. Predict the cliff before users hit it

    7 min

    If you know your per-turn growth rate and your context window, you can compute exactly when a session will overflow — before it does.

  5. Most APIs don't gracefully truncate — they reject

    7 min

    When you exceed the context window, the API returns an error — your app, not the model, has to decide what to drop and how.

Phase 3Pick the Right Memory Tool

Choose between long-context, RAG, summarization, sliding

4 drops
  1. Your support team uploaded the 800-page PDF

    7 min

    Long-context windows are a tool for whole-document reasoning — not a substitute for retrieval over knowledge bases.

  2. The chatbot can't find facts in your own docs

    8 min

    When RAG fails, the bug is almost always in one specific step — chunking, embedding, retrieval, or injection — not in the architecture as a whole.

  3. Long sessions are losing the thread

    8 min

    Summarization compresses old context into structured memory while keeping recent turns verbatim — the right tool when sessions are long but session-level continuity matters.

  4. The fix depends on the failure mode

    8 min

    There are at least four distinct 'forgetting' failure modes — same-session, cross-session, retrieval, and attention — and each maps to a different tool.

Phase 4Design a Handbook Bot's Memory

Design a memory plan for a 500-page handbook bot

1 drop
  1. Design memory for a 500-page handbook bot

    25 min

    A real memory strategy is six explicit decisions — corpus storage, chunking, prompt layout, conversation memory, truncation, failure detection — with numbers attached.

Frequently asked questions

What is a context window in an LLM?
This is covered in the “Understand Context Windows in LLMs” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why do I get 'context length exceeded' errors and how do I fix them?
This is covered in the “Understand Context Windows in LLMs” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What's the difference between context window, conversation memory, and RAG?
This is covered in the “Understand Context Windows in LLMs” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should I use a bigger context window versus retrieval?
This is covered in the “Understand Context Windows in LLMs” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I estimate how many tokens my conversation is using?
This is covered in the “Understand Context Windows in LLMs” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.