What is gradient descent in plain English?

This is covered in the "Understand Gradient Descent" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Why does a learning rate that's too high cause loss to explode?

This is covered in the "Understand Gradient Descent" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

What's the difference between SGD, momentum, and Adam?

This is covered in the "Understand Gradient Descent" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

How do I tell from a loss curve that my learning rate is wrong?

This is covered in the "Understand Gradient Descent" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Why does the loss sometimes plateau and then drop again?

This is covered in the "Understand Gradient Descent" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Back to library

📉Understand Gradient Descent

Stop treating the optimizer as a black box — walk a 2D loss surface by hand, feel why a learning rate that's too big diverges and one that's too small stalls, and learn to read SGD, momentum, and Adam loss curves the way a doctor reads a chart.

Applied14 drops~2-week path · 5–8 min/daytechnology math

Phase 1Why Models Walk Downhill

See gradients as steepest-descent arrows on a real surface

4 drops

Every model trains by walking downhill on a loss surface
6 min
Every model trains by walking downhill on a loss surface
The gradient points uphill — you walk the other way
6 min
The gradient points uphill — you walk the other way
The learning rate is your step size, not your speed
6 min
The learning rate is your step size, not your speed
One equation runs every neural network on Earth
7 min
One equation runs every neural network on Earth

Phase 2Stepping Across a 2D Bowl by Hand

Step across a 2D bowl by hand and plot the path

5 drops

A 2D bowl is the smallest model that teaches everything
6 min
A 2D bowl is the smallest model that teaches everything
Take ten steps and watch the path bend toward zero
7 min
Take ten steps and watch the path bend toward zero
Set η = 0.6 and watch the bowl spit you out
7 min
Set η = 0.6 and watch the bowl spit you out
Set η = 0.001 and watch the bowl never end
6 min
Set η = 0.001 and watch the bowl never end
Three runs, three curves, one equation
6 min
Three runs, three curves, one equation

Phase 3What SGD, Momentum, and Adam Each Fix

Compare what SGD, momentum, and Adam each fix

4 drops

You don't have time to compute the full gradient
6 min
You don't have time to compute the full gradient
Your model needs memory of where it was going
6 min
Your model needs memory of where it was going
Different parameters need different learning rates
7 min
Different parameters need different learning rates
The optimizer you pick is the one that fixes your bottleneck
7 min
The optimizer you pick is the one that fixes your bottleneck

Phase 4Diagnose Three Real Loss Curves

Diagnose three real loss curves like an ML engineer

1 drop

Read three loss curves and prescribe the fix
8 min
Read three loss curves and prescribe the fix

Frequently asked questions

What is gradient descent in plain English?: This is covered in the “Understand Gradient Descent” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why does a learning rate that's too high cause loss to explode?: This is covered in the “Understand Gradient Descent” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What's the difference between SGD, momentum, and Adam?: This is covered in the “Understand Gradient Descent” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I tell from a loss curve that my learning rate is wrong?: This is covered in the “Understand Gradient Descent” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why does the loss sometimes plateau and then drop again?: This is covered in the “Understand Gradient Descent” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

📉Understand Gradient Descent

Phase 1Why Models Walk Downhill

Every model trains by walking downhill on a loss surface

The gradient points uphill — you walk the other way

The learning rate is your step size, not your speed

One equation runs every neural network on Earth

Phase 2Stepping Across a 2D Bowl by Hand

A 2D bowl is the smallest model that teaches everything

Take ten steps and watch the path bend toward zero

Set η = 0.6 and watch the bowl spit you out

Set η = 0.001 and watch the bowl never end

Three runs, three curves, one equation

Phase 3What SGD, Momentum, and Adam Each Fix

You don't have time to compute the full gradient

Your model needs memory of where it was going

Different parameters need different learning rates

The optimizer you pick is the one that fixes your bottleneck

Phase 4Diagnose Three Real Loss Curves

Read three loss curves and prescribe the fix

Frequently asked questions

🔵Learn Set Theory Basics: The Language Every Math Class Assumes

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

📈Big O Intuition

Phase 1Why Models Walk Downhill

Every model trains by walking downhill on a loss surface

The gradient points uphill — you walk the other way

The learning rate is your step size, not your speed

One equation runs every neural network on Earth

Phase 2Stepping Across a 2D Bowl by Hand

A 2D bowl is the smallest model that teaches everything

Take ten steps and watch the path bend toward zero

Set η = 0.6 and watch the bowl spit you out

Set η = 0.001 and watch the bowl never end

Three runs, three curves, one equation

Phase 3What SGD, Momentum, and Adam Each Fix

You don't have time to compute the full gradient

Your model needs memory of where it was going

Different parameters need different learning rates

The optimizer you pick is the one that fixes your bottleneck

Phase 4Diagnose Three Real Loss Curves

Read three loss curves and prescribe the fix

Frequently asked questions

Related paths

🔵Learn Set Theory Basics: The Language Every Math Class Assumes

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

📈Big O Intuition