What is the actual difference between DPO and IPO?

This is covered in the "Compare DPO, IPO, KTO, ORPO, and SimPO" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

When should you use KTO instead of DPO?

This is covered in the "Compare DPO, IPO, KTO, ORPO, and SimPO" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Why does ORPO drop the reference model, and what does that cost?

This is covered in the "Compare DPO, IPO, KTO, ORPO, and SimPO" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Is SimPO strictly better than DPO at scale?

This is covered in the "Compare DPO, IPO, KTO, ORPO, and SimPO" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

How do you choose a preference algorithm from your data shape?

This is covered in the "Compare DPO, IPO, KTO, ORPO, and SimPO" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Back to library

⚖️Compare DPO, IPO, KTO, ORPO, and SimPO

Map each post-DPO algorithm — IPO, KTO, ORPO, SimPO — to the exact failure mode it fixes, so picking one stops being a coin flip. By the end, you'll match three real datasets to the right algorithm and justify each call in a paragraph.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1Why DPO needed five sequels in eighteen months

See the three DPO failure modes that spawned every variant

4 drops

DPO didn't get replaced — it got patched five times
6 min
DPO didn't get replaced — it got patched five times
DPO will happily learn from noise without telling you
7 min
DPO will happily learn from noise without telling you
Pairs are a luxury most annotation pipelines can't afford
6 min
Pairs are a luxury most annotation pipelines can't afford
DPO secretly demands two copies of your model in memory
7 min
DPO secretly demands two copies of your model in memory

Phase 2Match each algorithm to the bug it fixes

Match each algorithm to the failure mode it fixes

5 drops

IPO turns DPO's loss into something that knows when to stop
7 min
IPO turns DPO's loss into something that knows when to stop
KTO ships when your annotators only see one response at a time
7 min
KTO ships when your annotators only see one response at a time
ORPO folds preference into SFT and drops the reference model
7 min
ORPO folds preference into SFT and drops the reference model
SimPO normalizes by length and drops the reference model too
7 min
SimPO normalizes by length and drops the reference model too
Five algorithms, one decision tree, three questions
7 min
Five algorithms, one decision tree, three questions

Phase 3Choose by data shape and compute budget

Decide which variant fits real annotation and compute constraints

4 drops

Your annotators rate one at a time. What now?
7 min
Your annotators rate one at a time. What now?
The 70B fits at inference. It doesn't fit for DPO.
8 min
The 70B fits at inference. It doesn't fit for DPO.
Your DPO runs got unstable past 100k preferences
8 min
Your DPO runs got unstable past 100k preferences
Two algorithms feel equal. How do you actually decide?
8 min
Two algorithms feel equal. How do you actually decide?

Phase 4Pick the right algorithm for three real datasets

Pick the right algorithm for three real datasets

1 drop

Three datasets, three algorithms, three paragraphs
8 min
Three datasets, three algorithms, three paragraphs

Frequently asked questions

What is the actual difference between DPO and IPO?: This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should you use KTO instead of DPO?: This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why does ORPO drop the reference model, and what does that cost?: This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is SimPO strictly better than DPO at scale?: This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do you choose a preference algorithm from your data shape?: This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

⚖️Compare DPO, IPO, KTO, ORPO, and SimPO

Phase 1Why DPO needed five sequels in eighteen months

DPO didn't get replaced — it got patched five times

DPO will happily learn from noise without telling you

Pairs are a luxury most annotation pipelines can't afford

DPO secretly demands two copies of your model in memory

Phase 2Match each algorithm to the bug it fixes

IPO turns DPO's loss into something that knows when to stop

KTO ships when your annotators only see one response at a time

ORPO folds preference into SFT and drops the reference model

SimPO normalizes by length and drops the reference model too

Five algorithms, one decision tree, three questions

Phase 3Choose by data shape and compute budget

Your annotators rate one at a time. What now?

The 70B fits at inference. It doesn't fit for DPO.

Your DPO runs got unstable past 100k preferences

Two algorithms feel equal. How do you actually decide?

Phase 4Pick the right algorithm for three real datasets

Three datasets, three algorithms, three paragraphs

Frequently asked questions

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition

Phase 1Why DPO needed five sequels in eighteen months

DPO didn't get replaced — it got patched five times

DPO will happily learn from noise without telling you

Pairs are a luxury most annotation pipelines can't afford

DPO secretly demands two copies of your model in memory

Phase 2Match each algorithm to the bug it fixes

IPO turns DPO's loss into something that knows when to stop

KTO ships when your annotators only see one response at a time

ORPO folds preference into SFT and drops the reference model

SimPO normalizes by length and drops the reference model too

Five algorithms, one decision tree, three questions

Phase 3Choose by data shape and compute budget

Your annotators rate one at a time. What now?

The 70B fits at inference. It doesn't fit for DPO.

Your DPO runs got unstable past 100k preferences

Two algorithms feel equal. How do you actually decide?

Phase 4Pick the right algorithm for three real datasets

Three datasets, three algorithms, three paragraphs

Frequently asked questions

Related paths

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition