What is extended thinking in an LLM and how is it different from chain-of-thought?

This is covered in the "Learn Extended Thinking and Reasoning Modes" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

When does turning on thinking mode actually improve accuracy?

This is covered in the "Learn Extended Thinking and Reasoning Modes" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

How much does extended thinking increase latency and cost?

This is covered in the "Learn Extended Thinking and Reasoning Modes" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Is hidden thinking the same as chain-of-thought prompting?

This is covered in the "Learn Extended Thinking and Reasoning Modes" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

How do I decide whether a prompt needs a reasoning model?

This is covered in the "Learn Extended Thinking and Reasoning Modes" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Back to library

🧠Learn Extended Thinking and Reasoning Modes

Stop guessing whether 'thinking mode' helps and start measuring it. By the end you can run the same prompt with and without extended reasoning, see where it lifts accuracy and where it just burns cost, and make the call per task class instead of by vibe.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1Why Thinking at Inference Time Beats a Bigger Model

See why inference-time compute fixes different bugs

4 drops

More compute at answer time, not more parameters
6 min
Thinking modes scale a different axis — time spent per answer — and that axis fixes a different bug than a bigger model does.
Thinking helps where the answer needs steps, not facts
6 min
Extended thinking lifts multi-step reasoning and self-checking; it barely moves recall, style, or one-shot pattern matching.
Thinking buys accuracy with latency and tokens
6 min
Every reasoning token is billed and waited for, so thinking only wins when its accuracy lift outruns its 3–10x cost.
Three task classes, three different verdicts
7 min
Calculation, ambiguous spec, and summary respond to thinking in three different ways — and those three classes will be your benchmark for the rest of the path.

Phase 2Run the Same Prompt Both Ways

Run three task classes with and without thinking

5 drops

Build a two-row test rig before you measure anything
6 min
A useful comparison is two rows — same prompt, same model, thinking off vs on — logging tokens, latency, and a verdict you can defend.
On math, thinking earns its tax
7 min
Multi-step calculation is where thinking modes shine — accuracy lifts from coin-flip to consistent on the same model.
On ambiguous specs, thinking surfaces what was unsaid
7 min
When a request hides assumptions, thinking turns 'plausible answer to the wrong question' into 'careful answer that names the ambiguity.'
On summaries, thinking is mostly a waste
6 min
Compression tasks barely benefit from a reasoning trace, because there's nothing to deduce — the answer is already a function of the input.
Read your rig — three rows, three verdicts
6 min
Six rows of evidence collapse into three rules: thinking on for calculation, on for ambiguous spec, off for summary. That's the policy you ship.

Phase 3Three Shapes of the Same Idea

Compare inline CoT, hidden thinking, and scratchpads

4 drops

Chain-of-thought prompting was the first thinking mode
6 min
Asking the model to 'think step by step' was the original inference-time-compute hack — visible, controllable, and still useful where hidden thinking is overkill.
Hidden thinking is CoT you don't get to read
6 min
Vendors hide the reasoning trace partly to protect their model's process and partly because users find it more confident — but the opacity has real costs.
Agents externalize thinking on purpose
7 min
Agent loops, ReAct, and tool-use patterns make reasoning visible and controllable by writing it to an external scratchpad — a third shape of the same axis.
Pick the shape, not just the toggle
7 min
The right question isn't 'thinking on or off?' It's 'CoT, hidden thinking, or scratchpad?' — chosen against the task's shape and visibility needs.

Phase 4Decide Per-Prompt Whether Thinking Earns Its Cost

Decide per-prompt whether thinking earns its cost

1 drop

Audit your real prompts and ship a thinking policy
22 min
Audit your real prompts and ship a thinking policy

Frequently asked questions

What is extended thinking in an LLM and how is it different from chain-of-thought?: This is covered in the “Learn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When does turning on thinking mode actually improve accuracy?: This is covered in the “Learn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How much does extended thinking increase latency and cost?: This is covered in the “Learn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is hidden thinking the same as chain-of-thought prompting?: This is covered in the “Learn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I decide whether a prompt needs a reasoning model?: This is covered in the “Learn Extended Thinking and Reasoning Modes” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

🧠Learn Extended Thinking and Reasoning Modes

Phase 1Why Thinking at Inference Time Beats a Bigger Model

More compute at answer time, not more parameters

Thinking helps where the answer needs steps, not facts

Thinking buys accuracy with latency and tokens

Three task classes, three different verdicts

Phase 2Run the Same Prompt Both Ways

Build a two-row test rig before you measure anything

On math, thinking earns its tax

On ambiguous specs, thinking surfaces what was unsaid

On summaries, thinking is mostly a waste

Read your rig — three rows, three verdicts

Phase 3Three Shapes of the Same Idea

Chain-of-thought prompting was the first thinking mode

Hidden thinking is CoT you don't get to read

Agents externalize thinking on purpose

Pick the shape, not just the toggle

Phase 4Decide Per-Prompt Whether Thinking Earns Its Cost

Audit your real prompts and ship a thinking policy

Frequently asked questions

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition

Phase 1Why Thinking at Inference Time Beats a Bigger Model

More compute at answer time, not more parameters

Thinking helps where the answer needs steps, not facts

Thinking buys accuracy with latency and tokens

Three task classes, three different verdicts

Phase 2Run the Same Prompt Both Ways

Build a two-row test rig before you measure anything

On math, thinking earns its tax

On ambiguous specs, thinking surfaces what was unsaid

On summaries, thinking is mostly a waste

Read your rig — three rows, three verdicts

Phase 3Three Shapes of the Same Idea

Chain-of-thought prompting was the first thinking mode

Hidden thinking is CoT you don't get to read

Agents externalize thinking on purpose

Pick the shape, not just the toggle

Phase 4Decide Per-Prompt Whether Thinking Earns Its Cost

Audit your real prompts and ship a thinking policy

Frequently asked questions

Related paths

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition