Back to library

📊Understand Cross-Validation

Stop running k-fold on autopilot — see why a single train-test split lies, watch variance shrink across folds you split by hand, and pick stratified, group, or time-series CV for three real datasets without ever leaking the future into the past.

Foundations14 drops~2-week path · 5–8 min/daytechnology

Phase 1Why One Split Lies

See why one train-test split lies about your model

4 drops
  1. Your 80/20 split is one roll of a noisy die

    6 min

    Your 80/20 split is one roll of a noisy die

  2. Five rolls beat one roll, every time

    6 min

    Five rolls beat one roll, every time

  3. Why folding the same data gives a more honest score

    6 min

    Why folding the same data gives a more honest score

  4. Cross-validation does not replace a holdout set

    7 min

    Cross-validation does not replace a holdout set

Phase 2K-Fold by Hand on Ten Rows

Run k-fold by hand and watch variance shrink

5 drops
  1. Slice ten rows into five folds with a pencil

    6 min

    Slice ten rows into five folds with a pencil

  2. Five fits, five scores, one honest mean

    6 min

    Five fits, five scores, one honest mean

  3. When folds disagree, the model is telling you something

    6 min

    When folds disagree, the model is telling you something

  4. Stratified k-fold: every fold reflects every class

    6 min

    Stratified k-fold: every fold reflects every class

  5. Repeated k-fold buys more confidence — at a cost

    7 min

    Repeated k-fold buys more confidence — at a cost

Phase 3Pick the Right CV Variant

Pick stratified, group, or time-series — and avoid leaks

4 drops
  1. Your model thinks it's a genius — it just memorized the patient

    7 min

    Your model thinks it's a genius — it just memorized the patient

  2. You can't fold time — only walk it forward

    7 min

    You can't fold time — only walk it forward

  3. Tuning needs its own validation, or you're scoring the search

    7 min

    Tuning needs its own validation, or you're scoring the search

  4. Most CV failures are leaks, not splitters

    7 min

    Most CV failures are leaks, not splitters

Phase 4Choose CV for Three Real Datasets

Choose the right CV for three real datasets

1 drop
  1. Pick the right CV for three real datasets at work

    8 min

    Pick the right CV for three real datasets at work

Frequently asked questions

What is cross-validation and why do I need it?
This is covered in the “Understand Cross-Validation” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
When should I use stratified k-fold instead of plain k-fold?
This is covered in the “Understand Cross-Validation” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Why is regular k-fold wrong for time-series data?
This is covered in the “Understand Cross-Validation” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What's the difference between group k-fold and stratified k-fold?
This is covered in the “Understand Cross-Validation” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I prevent data leakage when doing cross-validation?
This is covered in the “Understand Cross-Validation” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.