Back to library

🧠Learn Extended Thinking and Reasoning Modes

Stop guessing whether 'thinking mode' helps and start measuring it. By the end you can run the same prompt with and without extended reasoning, see where it lifts accuracy and where it just burns cost, and make the call per task class instead of by vibe.

Applied14 drops~2-week path Β· 5–8 min/daytechnology

Phase 1Why Thinking at Inference Time Beats a Bigger Model

See why inference-time compute fixes different bugs

4 drops
  1. More compute at answer time, not more parameters

    6 min

    Thinking modes scale a different axis β€” time spent per answer β€” and that axis fixes a different bug than a bigger model does.

  2. Thinking helps where the answer needs steps, not facts

    6 min

    Extended thinking lifts multi-step reasoning and self-checking; it barely moves recall, style, or one-shot pattern matching.

  3. Thinking buys accuracy with latency and tokens

    6 min

    Every reasoning token is billed and waited for, so thinking only wins when its accuracy lift outruns its 3–10x cost.

  4. Three task classes, three different verdicts

    7 min

    Calculation, ambiguous spec, and summary respond to thinking in three different ways β€” and those three classes will be your benchmark for the rest of the path.

Phase 2Run the Same Prompt Both Ways

Run three task classes with and without thinking

5 drops
  1. Build a two-row test rig before you measure anything

    6 min

    A useful comparison is two rows β€” same prompt, same model, thinking off vs on β€” logging tokens, latency, and a verdict you can defend.

  2. On math, thinking earns its tax

    7 min

    Multi-step calculation is where thinking modes shine β€” accuracy lifts from coin-flip to consistent on the same model.

  3. On ambiguous specs, thinking surfaces what was unsaid

    7 min

    When a request hides assumptions, thinking turns 'plausible answer to the wrong question' into 'careful answer that names the ambiguity.'

  4. On summaries, thinking is mostly a waste

    6 min

    Compression tasks barely benefit from a reasoning trace, because there's nothing to deduce β€” the answer is already a function of the input.

  5. Read your rig β€” three rows, three verdicts

    6 min

    Six rows of evidence collapse into three rules: thinking on for calculation, on for ambiguous spec, off for summary. That's the policy you ship.

Phase 3Three Shapes of the Same Idea

Compare inline CoT, hidden thinking, and scratchpads

4 drops
  1. Chain-of-thought prompting was the first thinking mode

    6 min

    Asking the model to 'think step by step' was the original inference-time-compute hack β€” visible, controllable, and still useful where hidden thinking is overkill.

  2. Hidden thinking is CoT you don't get to read

    6 min

    Vendors hide the reasoning trace partly to protect their model's process and partly because users find it more confident β€” but the opacity has real costs.

  3. Agents externalize thinking on purpose

    7 min

    Agent loops, ReAct, and tool-use patterns make reasoning visible and controllable by writing it to an external scratchpad β€” a third shape of the same axis.

  4. Pick the shape, not just the toggle

    7 min

    The right question isn't 'thinking on or off?' It's 'CoT, hidden thinking, or scratchpad?' β€” chosen against the task's shape and visibility needs.

Phase 4Decide Per-Prompt Whether Thinking Earns Its Cost

Decide per-prompt whether thinking earns its cost

1 drop
  1. Audit your real prompts and ship a thinking policy

    22 min

    Audit your real prompts and ship a thinking policy

Frequently asked questions

What is extended thinking in an LLM and how is it different from chain-of-thought?
This is covered in the β€œLearn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When does turning on thinking mode actually improve accuracy?
This is covered in the β€œLearn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How much does extended thinking increase latency and cost?
This is covered in the β€œLearn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is hidden thinking the same as chain-of-thought prompting?
This is covered in the β€œLearn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I decide whether a prompt needs a reasoning model?
This is covered in the β€œLearn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.