What is the AI alignment problem in plain language?

This is covered in the "Understand Alignment as a Research Problem" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

What is the difference between outer alignment and inner alignment?

This is covered in the "Understand Alignment as a Research Problem" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

What does deceptive alignment mean and why is it hard to test for?

This is covered in the "Understand Alignment as a Research Problem" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

What is scalable oversight and why do we need it?

This is covered in the "Understand Alignment as a Research Problem" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Is AI alignment the same thing as AI safety?

This is covered in the "Understand Alignment as a Research Problem" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Back to library

🧭Understand Alignment as a Research Problem

Treat AI alignment as a research field with concrete open problems — outer vs inner alignment, deceptive alignment, and scalable oversight — instead of vibes about doom or guardrails. Walk away able to write a one-paragraph map of the alignment landscape that holds up to a skeptical reader.

Advanced14 drops~2-week path · 5–8 min/daytechnology

Phase 1What 'Alignment' Actually Means

See why specifying human values is itself unsolved

4 drops

Alignment isn't 'AI safety' — it's the specification problem
6 min
Alignment isn't 'AI safety' — it's the specification problem
Specifying human values is the unsolved part
7 min
Specifying human values is the unsolved part
A model can learn the wrong goal and still pass training
7 min
A model can learn the wrong goal and still pass training
Alignment used to be theoretical — capability made it concrete
6 min
Alignment used to be theoretical — capability made it concrete

Phase 2Outer, Inner, and Deceptive Alignment

Walk through outer, inner, and deceptive alignment with toy cases

5 drops

Outer alignment: did you write down the right objective?
6 min
Outer alignment: did you write down the right objective?
Inner alignment: is the model pursuing the objective you trained?
7 min
Inner alignment: is the model pursuing the objective you trained?
Walk through CoinRun: a tiny, complete inner-misalignment story
7 min
Walk through CoinRun: a tiny, complete inner-misalignment story
Deceptive alignment: looks aligned in training, isn't
8 min
Deceptive alignment: looks aligned in training, isn't
Build a clean three-bucket threat model in your head
6 min
Build a clean three-bucket threat model in your head

Phase 3Scalable Oversight and Real-World Application

Map scalable oversight, debate, and weak-to-strong generalization

4 drops

You're a senior reviewer and the model is smarter than you are. Now what?
7 min
You're a senior reviewer and the model is smarter than you are. Now what?
Two superhuman models argue. You judge. Does that work?
8 min
Two superhuman models argue. You judge. Does that work?
You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?
8 min
You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?
An AI helps oversee another AI. Where does the trust bottom out?
8 min
An AI helps oversee another AI. Where does the trust bottom out?

Phase 4Map the Landscape in One Paragraph

Write a one-paragraph alignment landscape that survives critique

1 drop

Write the alignment landscape in one paragraph that survives critique
15 min
Write the alignment landscape in one paragraph that survives critique

Frequently asked questions

What is the AI alignment problem in plain language?: This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What is the difference between outer alignment and inner alignment?: This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What does deceptive alignment mean and why is it hard to test for?: This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What is scalable oversight and why do we need it?: This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is AI alignment the same thing as AI safety?: This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

🧭Understand Alignment as a Research Problem

Phase 1What 'Alignment' Actually Means

Alignment isn't 'AI safety' — it's the specification problem

Specifying human values is the unsolved part

A model can learn the wrong goal and still pass training

Alignment used to be theoretical — capability made it concrete

Phase 2Outer, Inner, and Deceptive Alignment

Outer alignment: did you write down the right objective?

Inner alignment: is the model pursuing the objective you trained?

Walk through CoinRun: a tiny, complete inner-misalignment story

Deceptive alignment: looks aligned in training, isn't

Build a clean three-bucket threat model in your head

Phase 3Scalable Oversight and Real-World Application

You're a senior reviewer and the model is smarter than you are. Now what?

Two superhuman models argue. You judge. Does that work?

You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?

An AI helps oversee another AI. Where does the trust bottom out?

Phase 4Map the Landscape in One Paragraph

Write the alignment landscape in one paragraph that survives critique

Frequently asked questions

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition

Phase 1What 'Alignment' Actually Means

Alignment isn't 'AI safety' — it's the specification problem

Specifying human values is the unsolved part

A model can learn the wrong goal and still pass training

Alignment used to be theoretical — capability made it concrete

Phase 2Outer, Inner, and Deceptive Alignment

Outer alignment: did you write down the right objective?

Inner alignment: is the model pursuing the objective you trained?

Walk through CoinRun: a tiny, complete inner-misalignment story

Deceptive alignment: looks aligned in training, isn't

Build a clean three-bucket threat model in your head

Phase 3Scalable Oversight and Real-World Application

You're a senior reviewer and the model is smarter than you are. Now what?

Two superhuman models argue. You judge. Does that work?

You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?

An AI helps oversee another AI. Where does the trust bottom out?

Phase 4Map the Landscape in One Paragraph

Write the alignment landscape in one paragraph that survives critique

Frequently asked questions

Related paths

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition