Library
Page 5 of 14
🧪Understand Feature Engineering Fundamentals
Stop believing deep learning killed feature engineering — build the discipline of encoding, leakage, and target alignment so you can sketch a feature plan for any tabular problem and consistently beat raw-column baselines.
🪄Understand Emergent Capabilities in LLMs
See past the 'mystical jump' headlines: read the original emergence papers next to the 'mirage' rebuttal, watch the same task switch from jumpy to smooth when you swap the metric, and finish able to predict whether the capability you actually care about will scale in steps or in slopes.
📊Understand Cross-Validation
Stop running k-fold on autopilot — see why a single train-test split lies, watch variance shrink across folds you split by hand, and pick stratified, group, or time-series CV for three real datasets without ever leaking the future into the past.
🪟Understand Context Windows in LLMs
See past the 'context length exceeded' error and pick the right fix every time — trim, summarize, retrieve, or upgrade. By the end you can sketch a memory strategy for a chatbot answering from a 500-page handbook without guessing.
🏷️Supervised vs Unsupervised Learning
Stop memorizing the labels-vs-no-labels split. Learn to classify any ML problem by where its supervision comes from — including the messy self-supervised middle that powers modern AI.
🎛️Fine-Tuning vs In-Context Learning
Stop reaching for fine-tuning every time a prompt fails — diagnose whether you have a capability gap, format gap, or knowledge gap, then pick the cheapest fix that closes it. By the end you'll write a one-page memo your team can defend.
📜Understand Constitutional AI as a Training Principle
See exactly how a written set of principles becomes a training signal — through self-critique, revision, and AI-generated preference labels. By the end you'll draft a 5-principle constitution for one of your own AI applications.
📊Understand Confusion Matrices, Precision, and Recall
Stop reaching for accuracy by reflex — read a confusion matrix in seconds, compute precision and recall by hand, and pick the right metric for spam, fraud, and cancer-screening problems without second-guessing.
🧠Understand Chain-of-Thought Reasoning
Stop pasting 'let's think step by step' on every prompt and learn where chain-of-thought actually changes the answer — math reasoning, multi-step planning, ambiguous reading — and where it just burns tokens. Walk away able to point at three of your own prompts that genuinely need CoT, and three that don't.
🧭Understand Alignment as a Research Problem
Treat AI alignment as a research field with concrete open problems — outer vs inner alignment, deceptive alignment, and scalable oversight — instead of vibes about doom or guardrails. Walk away able to write a one-paragraph map of the alignment landscape that holds up to a skeptical reader.
🕵️Understand AI-Generated Content and Detection
Stop treating AI detection like a true-or-false test. Learn why statistical detectors fail in both directions, where watermarking and C2PA provenance actually help, and walk away with a content-provenance policy you can hand to your team or class.
🧪Master Prompt Engineering Principles
Stop chasing magic phrases. Learn the four principles — specificity, constraints, examples, decomposition — that survive every model upgrade, then ship a prompt spec and A/B it against your old one.
⚖️Learn When to Use a Small Model vs a Large Model
Stop defaulting to GPT-4 for tasks a 7B model handles fine. Build a per-task decision tree across capability, latency, and cost-per-million-tokens — then route your product's tasks accordingly.
🌀Learn Vibe Coding: Agentic Development Workflows
Vibe coding looks like magic until production breaks. This path separates the surface practice — chatting code into being — from the engineering discipline that keeps the result maintainable, ending with a guardrail playbook your team can actually follow.
🧪Learn to Evaluate LLM Outputs Systematically
Move from eyeballing LLM outputs to running a CI eval that blocks regressions on a real prompt. You'll build a 20-item dataset, write a binary rubric, calibrate LLM-as-judge, and ship the harness in your repo.
🔎Learn the Architecture of RAG Systems
Separate RAG into three pipelines — offline ingest, online retrieval, generation grounding — so each can be debugged on its own. By the end, you'll sketch a documentation-chatbot architecture and label every failure mode.
🔍Learn HyDE: Hypothetical Document Embeddings
Stop accepting bad RAG retrievals as a fact of life — see why short queries and long documents land in different regions of embedding space, watch HyDE close the gap by hallucinating a fake answer first, then decide which of your pipelines actually deserve the extra LLM call.
🔀Learn How Transformers Process Sequences
Trace one token through every block of a transformer — embed, position, attend, FFN, residual — until you can narrate, in plain English, how 'the cat sat' becomes French.
🧭Learn How Embeddings Encode Meaning
Stop treating embeddings as magic vectors. By the end you'll see meaning as geometry — and design a duplicate-FAQ detector for a 1000-question support corpus that you could actually ship.
🎯Learn Few-Shot Prompting
Stop pasting examples and hoping. Curate five that actually shift the model — representative, diverse, ordered, and formatted to match — then measure the lift on a task you ship.
🧠Learn Extended Thinking and Reasoning Modes
Stop guessing whether 'thinking mode' helps and start measuring it. By the end you can run the same prompt with and without extended reasoning, see where it lifts accuracy and where it just burns cost, and make the call per task class instead of by vibe.
🪟Learn Context Engineering
Stop polishing the prompt and start engineering the whole context — system instructions, examples, retrieval, history — as a budget you allocate on purpose. By the end you can refactor one bloated context into a prioritized layout and measure whether quality went up.
🖱️Learn Computer-Use and Browser Agent Patterns
Separate vision, plan, action, and verification so browser-agent failures stop feeling like 'the agent broke' and start being attributable. By the end, you'll map a real workflow you'd hand to a computer-use agent and predict the exact steps that will be brittle.
🧩Learn Chunking Strategies for RAG
Compare fixed, recursive, semantic, and document-aware chunking on the same source so trade-offs become visible — then pick a chunking strategy for one of your own document types and defend the choice.