Library
Page 4 of 14
🐛Debug Code with LLMs
Stop chasing the first plausible theory the AI offers. By the end you'll run a real debugging loop — hypothesis, counter-evidence, smallest test — with the LLM as your partner instead of your oracle.
🆚Compare LlamaIndex and LangChain for RAG
Stop picking a RAG framework from a Twitter poll. See LlamaIndex and LangChain side by side on the same pipeline so you can defend your choice for a real workload with real tradeoffs.
🔎Combine BM25 and Semantic Search (Hybrid Search)
Build hybrid search layer by layer — BM25 alone, vectors alone, then RRF fusion — so you can debug retrieval failures and predict which query types each layer fixes before you ship.
🗂️Choose a Vector Database
Build a five-axis scorecard — scale, hybrid search, filtering, ops, cost — that turns vector database selection from hype-driven guesswork into a defensible choice your future self will thank you for.
🔗Build a Mental Model of LangChain
Stop reading LangChain as 200 unrelated classes and start seeing one primitive — the Runnable — wired together with a pipe. By the end you can sketch a chain for any task and name every Runnable in it.
📐Understand Vector Similarity: Cosine, Dot Product, Euclidean
Stop reaching for cosine similarity by reflex. You'll compute all three metrics on the same vectors, see when normalization collapses two of them into one, and pick the right metric for three real retrieval tasks.
🔧Understand Tool Use in AI Agents
Stop debugging agents by shouting 'why did it pick that tool' — separate the contract into schema, selection, and execution so each can fail (and be fixed) independently. By the end you'll design two tools for one of your own products with names, schemas, and descriptions you can defend in review.
🔡Understand Tokenization: How Models See Text
Stop counting characters and start seeing text the way the model does — as subword pieces that vary wildly in cost. By the end you'll eyeball a paragraph's token count and know why emoji, code, and rare words inflate your bill.
📈Understand the Bitter Lesson and Scaling Laws
Stop quoting Sutton like a slogan and start reading scaling-law curves like a forecaster — by the end, you'll know exactly where the bitter lesson predicts the next AI breakthrough and where it quietly fails.
📉Understand the Bias-Variance Tradeoff
Turn the bias-variance formula into a hands-on debug checklist — read any train/val gap or learning curve and prescribe the right fix in minutes.
🎲Understand Temperature, Top-P, and Sampling
See exactly what temperature and top-p do to a model's probability distribution, then justify the sampling settings for your real tasks instead of guessing. Stop tweaking knobs and start engineering output behavior.
🧱Understand Structured Output and Function Calling
Stop bolting regex onto markdown-wrapped near-JSON. Compare prompt-asks, JSON mode, and schema-constrained decoding head-to-head, then write a strict schema for one of your real LLM outputs and test it for compliance.
🤖Understand RLHF: Reinforcement Learning from Human Feedback
Walk a single example through SFT, a reward model, and one PPO update so the RLHF loop stops feeling mythical. By the end, you'll sketch a preference-data pipeline for a real prompt in your own product.
🎯Understand Reranking in RAG Pipelines
See why a vector search alone almost never returns the right top-3, and add a cross-encoder rerank stage to a RAG prototype that measurably lifts precision@3.
🛡️Understand Prompt Injection Attacks
Audit your own LLM features for injection surfaces. Separate direct from indirect attacks with worked examples, then apply structured isolation, output filters, provenance, and least-authority tool design.
⚡Understand Prompt Caching and Why It Changes Economics
See exactly what prompt caching caches, why prefix order is suddenly the most important decision in your template, and how a single header flag can cut a 5k-token system prompt's cost by 80% — then ship a cache-friendly template for one of your hottest endpoints.
🧠Understand Neural Network Fundamentals
Strip neural networks back to arithmetic — weighted sums, a squash function, and stacking. By the end you'll trace a forward pass with a pencil and design a tabular-problem architecture you can defend choice by choice.
🧬Understand Multimodal Models
Crack open the three real fusion patterns — early, late, and joint — so when you face a multimodal task at work, the choice between vision, OCR, or both becomes mechanical instead of guesswork.
🧪Understand Model Distillation
Stop treating model distillation as alchemy. Walk one teacher-student loop with a real loss function, then sketch a distillation plan to take one of your existing prompts to a smaller, cheaper model — by output, by reasoning trace, or by preference.
🧠Understand Mixture-of-Experts (MoE) Architectures
Stop hearing 'experts vote' and start watching a single token route through a sparse layer — by the end you'll predict which inputs land on which expert in a small MoE you design yourself.
📊Understand LLM Benchmarks: MMLU, HumanEval, and Friends
Stop reading LLM benchmark scores like IQ tests. You'll learn what MMLU, HumanEval, GSM8K, MT-Bench, and friends actually measure, where each gets gamed, and how to rate a model release note's claims with calibrated skepticism.
🛡️Understand Jailbreaking and AI Safety
See LLM jailbreaking as four distinct attack families instead of one scary headline, then turn that taxonomy into a one-page risk note for an AI feature you actually ship.
🌀Understand Hallucinations in LLMs
Stop treating LLM hallucinations as one bug — see the three distinct failure modes, force each one on purpose, then add one guardrail to a prompt you actually use and measure whether it worked.
📉Understand Gradient Descent
Stop treating the optimizer as a black box — walk a 2D loss surface by hand, feel why a learning rate that's too big diverges and one that's too small stalls, and learn to read SGD, momentum, and Adam loss curves the way a doctor reads a chart.