π§ Learn Extended Thinking and Reasoning Modes
Stop guessing whether 'thinking mode' helps and start measuring it. By the end you can run the same prompt with and without extended reasoning, see where it lifts accuracy and where it just burns cost, and make the call per task class instead of by vibe.
Phase 1Why Thinking at Inference Time Beats a Bigger Model
See why inference-time compute fixes different bugs
More compute at answer time, not more parameters
6 minThinking modes scale a different axis β time spent per answer β and that axis fixes a different bug than a bigger model does.
Thinking helps where the answer needs steps, not facts
6 minExtended thinking lifts multi-step reasoning and self-checking; it barely moves recall, style, or one-shot pattern matching.
Thinking buys accuracy with latency and tokens
6 minEvery reasoning token is billed and waited for, so thinking only wins when its accuracy lift outruns its 3β10x cost.
Three task classes, three different verdicts
7 minCalculation, ambiguous spec, and summary respond to thinking in three different ways β and those three classes will be your benchmark for the rest of the path.
Phase 2Run the Same Prompt Both Ways
Run three task classes with and without thinking
Build a two-row test rig before you measure anything
6 minA useful comparison is two rows β same prompt, same model, thinking off vs on β logging tokens, latency, and a verdict you can defend.
On math, thinking earns its tax
7 minMulti-step calculation is where thinking modes shine β accuracy lifts from coin-flip to consistent on the same model.
On ambiguous specs, thinking surfaces what was unsaid
7 minWhen a request hides assumptions, thinking turns 'plausible answer to the wrong question' into 'careful answer that names the ambiguity.'
On summaries, thinking is mostly a waste
6 minCompression tasks barely benefit from a reasoning trace, because there's nothing to deduce β the answer is already a function of the input.
Read your rig β three rows, three verdicts
6 minSix rows of evidence collapse into three rules: thinking on for calculation, on for ambiguous spec, off for summary. That's the policy you ship.
Phase 3Three Shapes of the Same Idea
Compare inline CoT, hidden thinking, and scratchpads
Chain-of-thought prompting was the first thinking mode
6 minAsking the model to 'think step by step' was the original inference-time-compute hack β visible, controllable, and still useful where hidden thinking is overkill.
Hidden thinking is CoT you don't get to read
6 minVendors hide the reasoning trace partly to protect their model's process and partly because users find it more confident β but the opacity has real costs.
Agents externalize thinking on purpose
7 minAgent loops, ReAct, and tool-use patterns make reasoning visible and controllable by writing it to an external scratchpad β a third shape of the same axis.
Pick the shape, not just the toggle
7 minThe right question isn't 'thinking on or off?' It's 'CoT, hidden thinking, or scratchpad?' β chosen against the task's shape and visibility needs.
Phase 4Decide Per-Prompt Whether Thinking Earns Its Cost
Decide per-prompt whether thinking earns its cost
Audit your real prompts and ship a thinking policy
22 minAudit your real prompts and ship a thinking policy
Frequently asked questions
- What is extended thinking in an LLM and how is it different from chain-of-thought?
- This is covered in the βLearn Extended Thinking and Reasoning Modesβ learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- When does turning on thinking mode actually improve accuracy?
- This is covered in the βLearn Extended Thinking and Reasoning Modesβ learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How much does extended thinking increase latency and cost?
- This is covered in the βLearn Extended Thinking and Reasoning Modesβ learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- Is hidden thinking the same as chain-of-thought prompting?
- This is covered in the βLearn Extended Thinking and Reasoning Modesβ learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How do I decide whether a prompt needs a reasoning model?
- This is covered in the βLearn Extended Thinking and Reasoning Modesβ learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Related paths
πPython Decorators Introduction
Build one mental model for Python decorators that covers closures, argument passing, functools.wraps, and stacking β then ship a working caching or logging decorator from scratch in under 30 lines.
π¦Rust Lifetimes Explained
Stop reading `'a` as line noise and start reading it as scope arithmetic β one failing snippet at a time β until you can thread lifetimes through a small parser or iterator adapter without fighting the borrow checker.
βΈοΈKubernetes Core Concepts
Stop drowning in 30+ resource types. Build the mental model one primitive at a time -- pods, deployments, services, ingress, config -- then deploy a real app with rolling updates and health checks.
πBig O Intuition
Stop treating Big O as math you memorized for an interview β build the intuition to spot O(nΒ²) disasters, pick the right data structure without thinking, and rewrite a slow function from O(nΒ²) to O(n) in under five minutes.