Back to library

⚖️Compare DPO, IPO, KTO, ORPO, and SimPO

Map each post-DPO algorithm — IPO, KTO, ORPO, SimPO — to the exact failure mode it fixes, so picking one stops being a coin flip. By the end, you'll match three real datasets to the right algorithm and justify each call in a paragraph.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1Why DPO needed five sequels in eighteen months

See the three DPO failure modes that spawned every variant

4 drops
  1. DPO didn't get replaced — it got patched five times

    6 min

    DPO didn't get replaced — it got patched five times

  2. DPO will happily learn from noise without telling you

    7 min

    DPO will happily learn from noise without telling you

  3. Pairs are a luxury most annotation pipelines can't afford

    6 min

    Pairs are a luxury most annotation pipelines can't afford

  4. DPO secretly demands two copies of your model in memory

    7 min

    DPO secretly demands two copies of your model in memory

Phase 2Match each algorithm to the bug it fixes

Match each algorithm to the failure mode it fixes

5 drops
  1. IPO turns DPO's loss into something that knows when to stop

    7 min

    IPO turns DPO's loss into something that knows when to stop

  2. KTO ships when your annotators only see one response at a time

    7 min

    KTO ships when your annotators only see one response at a time

  3. ORPO folds preference into SFT and drops the reference model

    7 min

    ORPO folds preference into SFT and drops the reference model

  4. SimPO normalizes by length and drops the reference model too

    7 min

    SimPO normalizes by length and drops the reference model too

  5. Five algorithms, one decision tree, three questions

    7 min

    Five algorithms, one decision tree, three questions

Phase 3Choose by data shape and compute budget

Decide which variant fits real annotation and compute constraints

4 drops
  1. Your annotators rate one at a time. What now?

    7 min

    Your annotators rate one at a time. What now?

  2. The 70B fits at inference. It doesn't fit for DPO.

    8 min

    The 70B fits at inference. It doesn't fit for DPO.

  3. Your DPO runs got unstable past 100k preferences

    8 min

    Your DPO runs got unstable past 100k preferences

  4. Two algorithms feel equal. How do you actually decide?

    8 min

    Two algorithms feel equal. How do you actually decide?

Phase 4Pick the right algorithm for three real datasets

Pick the right algorithm for three real datasets

1 drop
  1. Three datasets, three algorithms, three paragraphs

    8 min

    Three datasets, three algorithms, three paragraphs

Frequently asked questions

What is the actual difference between DPO and IPO?
This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should you use KTO instead of DPO?
This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why does ORPO drop the reference model, and what does that cost?
This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Is SimPO strictly better than DPO at scale?
This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do you choose a preference algorithm from your data shape?
This is covered in the “Compare DPO, IPO, KTO, ORPO, and SimPO” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.