How much cheaper is Haiku than Opus per token?

This is covered in the "Optimize Cost in LLM Applications" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Does prompt caching actually save money or just latency?

This is covered in the "Optimize Cost in LLM Applications" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

When is the Anthropic batch API worth using?

This is covered in the "Optimize Cost in LLM Applications" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Should I use RAG or just stuff everything into a long context window?

This is covered in the "Optimize Cost in LLM Applications" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

How do I estimate token cost before shipping a feature?

This is covered in the "Optimize Cost in LLM Applications" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Back to library

💰Optimize Cost in LLM Applications

Stop watching your LLM bill scale linearly with traffic. By the end you can take any feature, name three cost cuts with dollar estimates, and defend the tradeoffs to your team.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1The Five Cost Levers Hiding in Plain Sight

See the 60-300x price gap most teams ignore

4 drops

Haiku is roughly 60x cheaper than Opus for the same call
6 min
Within one model family, the cheap tier costs 60-300x less per token than the flagship — for many calls, the answer quality is indistinguishable.
Route by complexity, not by convenience
6 min
A two-line router that sends easy calls to Haiku and hard calls to Sonnet typically cuts model spend 70-90% with zero quality loss on the easy path.
Prompt caching turns repeated context into a 90% discount
7 min
Anthropic and OpenAI prompt caching charges full price once, then 10% (Anthropic) or 50% (OpenAI) for every cache hit on the same prefix.
Long context windows are a billing trap, not a feature
6 min
Stuffing 200K tokens into a context window every call is the most expensive way to give a model knowledge — RAG over the same content costs 10-100x less.

Phase 2Tracing Tokens to Dollars

Trace tokens to dollars and predict cache savings

5 drops

Estimate any call's cost in 30 seconds with one formula
7 min
Cost per call equals (input_tokens × input_price + output_tokens × output_price) ÷ 1,000,000. That's the whole formula — and most teams have never written it down.
Predict caching savings before you ship the cache
7 min
Caching savings = cacheable_tokens × hits_per_window × (input_price × 0.9). Three numbers tell you whether to bother.
Compress prompts before you cache them
6 min
Most production prompts are 30-50% padding — boilerplate instructions, redundant examples, polite filler — that costs real money on every call.
Cap output tokens to cap your bill
6 min
max_tokens is the simplest cost control you have, and it's the one most teams forget to set — leaving the model free to generate a 4,000-token response when 200 would do.
Every cost cut needs an eval to defend it
7 min
A 70% cost cut that breaks 5% of outputs is a regression, not a win. Without an eval, you can't tell the difference until customers complain.

Phase 3Choosing the Right Lever per Workload

Choose batch, cache, or RAG for real workloads

4 drops

Your nightly job is paying real-time prices for no reason
7 min
Your nightly job is paying real-time prices for no reason
Half your customer support questions are basically the same question
7 min
Half your customer support questions are basically the same question
Your codebase Q&A bot is loading the entire repo every call
8 min
Your codebase Q&A bot is loading the entire repo every call
Three levers on one feature compound to a 95% cost cut
8 min
Three levers on one feature compound to a 95% cost cut

Phase 4Auditing a Real Feature

Audit a real feature and propose three cost cuts

1 drop

Audit a real feature and propose three cost cuts with measured tradeoffs
20 min
Audit a real feature and propose three cost cuts with measured tradeoffs

Frequently asked questions

How much cheaper is Haiku than Opus per token?: This is covered in the “Optimize Cost in LLM Applications” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Does prompt caching actually save money or just latency?: This is covered in the “Optimize Cost in LLM Applications” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When is the Anthropic batch API worth using?: This is covered in the “Optimize Cost in LLM Applications” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Should I use RAG or just stuff everything into a long context window?: This is covered in the “Optimize Cost in LLM Applications” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I estimate token cost before shipping a feature?: This is covered in the “Optimize Cost in LLM Applications” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

💰Optimize Cost in LLM Applications

Phase 1The Five Cost Levers Hiding in Plain Sight

Haiku is roughly 60x cheaper than Opus for the same call

Route by complexity, not by convenience

Prompt caching turns repeated context into a 90% discount

Long context windows are a billing trap, not a feature

Phase 2Tracing Tokens to Dollars

Estimate any call's cost in 30 seconds with one formula

Predict caching savings before you ship the cache

Compress prompts before you cache them

Cap output tokens to cap your bill

Every cost cut needs an eval to defend it

Phase 3Choosing the Right Lever per Workload

Your nightly job is paying real-time prices for no reason

Half your customer support questions are basically the same question

Your codebase Q&A bot is loading the entire repo every call

Three levers on one feature compound to a 95% cost cut

Phase 4Auditing a Real Feature

Audit a real feature and propose three cost cuts with measured tradeoffs

Frequently asked questions

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition

Phase 1The Five Cost Levers Hiding in Plain Sight

Haiku is roughly 60x cheaper than Opus for the same call

Route by complexity, not by convenience

Prompt caching turns repeated context into a 90% discount

Long context windows are a billing trap, not a feature

Phase 2Tracing Tokens to Dollars

Estimate any call's cost in 30 seconds with one formula

Predict caching savings before you ship the cache

Compress prompts before you cache them

Cap output tokens to cap your bill

Every cost cut needs an eval to defend it

Phase 3Choosing the Right Lever per Workload

Your nightly job is paying real-time prices for no reason

Half your customer support questions are basically the same question

Your codebase Q&A bot is loading the entire repo every call

Three levers on one feature compound to a 95% cost cut

Phase 4Auditing a Real Feature

Audit a real feature and propose three cost cuts with measured tradeoffs

Frequently asked questions

Related paths

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition