Back to library

🧭Understand Alignment as a Research Problem

Treat AI alignment as a research field with concrete open problems — outer vs inner alignment, deceptive alignment, and scalable oversight — instead of vibes about doom or guardrails. Walk away able to write a one-paragraph map of the alignment landscape that holds up to a skeptical reader.

Advanced14 drops~2-week path · 5–8 min/daytechnology

Phase 1What 'Alignment' Actually Means

See why specifying human values is itself unsolved

4 drops
  1. Alignment isn't 'AI safety' — it's the specification problem

    6 min

    Alignment isn't 'AI safety' — it's the specification problem

  2. Specifying human values is the unsolved part

    7 min

    Specifying human values is the unsolved part

  3. A model can learn the wrong goal and still pass training

    7 min

    A model can learn the wrong goal and still pass training

  4. Alignment used to be theoretical — capability made it concrete

    6 min

    Alignment used to be theoretical — capability made it concrete

Phase 2Outer, Inner, and Deceptive Alignment

Walk through outer, inner, and deceptive alignment with toy cases

5 drops
  1. Outer alignment: did you write down the right objective?

    6 min

    Outer alignment: did you write down the right objective?

  2. Inner alignment: is the model pursuing the objective you trained?

    7 min

    Inner alignment: is the model pursuing the objective you trained?

  3. Walk through CoinRun: a tiny, complete inner-misalignment story

    7 min

    Walk through CoinRun: a tiny, complete inner-misalignment story

  4. Deceptive alignment: looks aligned in training, isn't

    8 min

    Deceptive alignment: looks aligned in training, isn't

  5. Build a clean three-bucket threat model in your head

    6 min

    Build a clean three-bucket threat model in your head

Phase 3Scalable Oversight and Real-World Application

Map scalable oversight, debate, and weak-to-strong generalization

4 drops
  1. You're a senior reviewer and the model is smarter than you are. Now what?

    7 min

    You're a senior reviewer and the model is smarter than you are. Now what?

  2. Two superhuman models argue. You judge. Does that work?

    8 min

    Two superhuman models argue. You judge. Does that work?

  3. You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?

    8 min

    You can only train with a weaker teacher. The student is more capable. Is the student actually aligned?

  4. An AI helps oversee another AI. Where does the trust bottom out?

    8 min

    An AI helps oversee another AI. Where does the trust bottom out?

Phase 4Map the Landscape in One Paragraph

Write a one-paragraph alignment landscape that survives critique

1 drop
  1. Write the alignment landscape in one paragraph that survives critique

    15 min

    Write the alignment landscape in one paragraph that survives critique

Frequently asked questions

What is the AI alignment problem in plain language?
This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What is the difference between outer alignment and inner alignment?
This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What does deceptive alignment mean and why is it hard to test for?
This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What is scalable oversight and why do we need it?
This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is AI alignment the same thing as AI safety?
This is covered in the “Understand Alignment as a Research Problem” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.