⚖️Compare DPO, IPO, KTO, ORPO, and SimPO
Map each post-DPO algorithm — IPO, KTO, ORPO, SimPO — to the exact failure mode it fixes, so picking one stops being a coin flip. By the end, you'll match three real datasets to the right algorithm and justify each call in a paragraph.
Phase 1Why DPO needed five sequels in eighteen months
See the three DPO failure modes that spawned every variant
DPO didn't get replaced — it got patched five times
6 minDPO didn't get replaced — it got patched five times
DPO will happily learn from noise without telling you
7 minDPO will happily learn from noise without telling you
Pairs are a luxury most annotation pipelines can't afford
6 minPairs are a luxury most annotation pipelines can't afford
DPO secretly demands two copies of your model in memory
7 minDPO secretly demands two copies of your model in memory
Phase 2Match each algorithm to the bug it fixes
Match each algorithm to the failure mode it fixes
IPO turns DPO's loss into something that knows when to stop
7 minIPO turns DPO's loss into something that knows when to stop
KTO ships when your annotators only see one response at a time
7 minKTO ships when your annotators only see one response at a time
ORPO folds preference into SFT and drops the reference model
7 minORPO folds preference into SFT and drops the reference model
SimPO normalizes by length and drops the reference model too
7 minSimPO normalizes by length and drops the reference model too
Five algorithms, one decision tree, three questions
7 minFive algorithms, one decision tree, three questions
Phase 3Choose by data shape and compute budget
Decide which variant fits real annotation and compute constraints
Your annotators rate one at a time. What now?
7 minYour annotators rate one at a time. What now?
The 70B fits at inference. It doesn't fit for DPO.
8 minThe 70B fits at inference. It doesn't fit for DPO.
Your DPO runs got unstable past 100k preferences
8 minYour DPO runs got unstable past 100k preferences
Two algorithms feel equal. How do you actually decide?
8 minTwo algorithms feel equal. How do you actually decide?
Phase 4Pick the right algorithm for three real datasets
Pick the right algorithm for three real datasets
Three datasets, three algorithms, three paragraphs
8 minThree datasets, three algorithms, three paragraphs
Frequently asked questions
- What is the actual difference between DPO and IPO?
- This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- When should you use KTO instead of DPO?
- This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- Why does ORPO drop the reference model, and what does that cost?
- This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- Is SimPO strictly better than DPO at scale?
- This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How do you choose a preference algorithm from your data shape?
- This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Related paths
🐍Python Decorators Introduction
Build one mental model for Python decorators that covers closures, argument passing, functools.wraps, and stacking — then ship a working caching or logging decorator from scratch in under 30 lines.
🦀Rust Lifetimes Explained
Stop reading `'a` as line noise and start reading it as scope arithmetic — one failing snippet at a time — until you can thread lifetimes through a small parser or iterator adapter without fighting the borrow checker.
☸️Kubernetes Core Concepts
Stop drowning in 30+ resource types. Build the mental model one primitive at a time -- pods, deployments, services, ingress, config -- then deploy a real app with rolling updates and health checks.
📈Big O Intuition
Stop treating Big O as math you memorized for an interview — build the intuition to spot O(n²) disasters, pick the right data structure without thinking, and rewrite a slow function from O(n²) to O(n) in under five minutes.