📉Detect Drift in LLM and ML Apps
Stop confusing 'data drift' and 'concept drift' — they need different fixes. Walk one feature through both kinds of drift on a real-shaped dataset, then design a drift dashboard for an LLM app where ground truth is delayed by 7 days.
Phase 1Three drifts, three fixes
Three drifts, three fixes — don't mix them up
Data drift, concept drift, and label drift are three different bugs
7 minData drift, concept drift, and label drift are three different bugs
Same feature, two stories — when input shift causes the metric drop and when it doesn't
7 minSame feature, two stories — when input shift causes the metric drop and when it doesn't
LLM apps add prompt drift and judge drift on top of the classic three
8 minLLM apps add prompt drift and judge drift on top of the classic three
Ground truth is delayed — and that lag is the central monitoring problem
7 minGround truth is delayed — and that lag is the central monitoring problem
Phase 2Rolling windows on a pricing feature
Rolling windows on a real-shaped pricing feature
Pick a window size that matches your signal's expected drift rate
7 minPick a window size that matches your signal's expected drift rate
The reference window is the model your monitor is defending — pick it deliberately
7 minThe reference window is the model your monitor is defending — pick it deliberately
Pick a test that matches your data type, not the test you happen to remember
7 minPick a test that matches your data type, not the test you happen to remember
Inject a synthetic drift and confirm your monitor catches it
8 minInject a synthetic drift and confirm your monitor catches it
Calibrate the threshold to your false-alarm tolerance, not to a textbook p-value
7 minCalibrate the threshold to your false-alarm tolerance, not to a textbook p-value
Phase 3DDM, ADWIN, Page-Hinkley vs metric gates
DDM, ADWIN, Page-Hinkley — and metric gates
Your teammate proposes 'just alert on a 10% accuracy drop'
7 minYour teammate proposes 'just alert on a 10% accuracy drop'
DDM, ADWIN, Page-Hinkley — three change-detectors for three problems
8 minDDM, ADWIN, Page-Hinkley — three change-detectors for three problems
Statistical test fires but metric is fine — who do you trust?
8 minStatistical test fires but metric is fine — who do you trust?
Retrain or re-monitor? Use the drift type to decide, not the alarm severity
8 minRetrain or re-monitor? Use the drift type to decide, not the alarm severity
Phase 4Design a dashboard for delayed truth
Design a drift dashboard for delayed ground truth
Design a drift dashboard for an LLM app with 7-day-delayed ground truth
10 minDesign a drift dashboard for an LLM app with 7-day-delayed ground truth
Frequently asked questions
- What's the difference between data drift, concept drift, and label drift?
- This is covered in the “Detect Drift in LLM and ML Apps” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- When should I use a statistical drift test like KS vs a metric-based gate?
- This is covered in the “Detect Drift in LLM and ML Apps” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How do you detect drift in an LLM app when ground truth is delayed by days?
- This is covered in the “Detect Drift in LLM and ML Apps” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- What window size should a rolling drift monitor use?
- This is covered in the “Detect Drift in LLM and ML Apps” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- When does drift mean 'retrain' vs 'just re-monitor'?
- This is covered in the “Detect Drift in LLM and ML Apps” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Related paths
🐍Python Decorators Introduction
Build one mental model for Python decorators that covers closures, argument passing, functools.wraps, and stacking — then ship a working caching or logging decorator from scratch in under 30 lines.
🦀Rust Lifetimes Explained
Stop reading `'a` as line noise and start reading it as scope arithmetic — one failing snippet at a time — until you can thread lifetimes through a small parser or iterator adapter without fighting the borrow checker.
☸️Kubernetes Core Concepts
Stop drowning in 30+ resource types. Build the mental model one primitive at a time -- pods, deployments, services, ingress, config -- then deploy a real app with rolling updates and health checks.
📈Big O Intuition
Stop treating Big O as math you memorized for an interview — build the intuition to spot O(n²) disasters, pick the right data structure without thinking, and rewrite a slow function from O(n²) to O(n) in under five minutes.