🧹Use AI to Refactor Legacy Code
Stop shipping AI-refactored legacy code that subtly breaks behavior. By the end you'll take a 200-line legacy function through explore → characterize → refactor → review and produce a version with provable behavior preservation — using AI on the careful steps, not as a shortcut around them.
Phase 1Why 'Refactor This' Is The Most Dangerous Prompt
Why 'refactor this' is the most dangerous prompt
'Refactor this' tells the AI to invent a goal
7 minLegacy code's behavior is the spec. When you prompt 'refactor this' with no constraints, the AI infers what 'better' means — which means it's free to change behavior it considers ugly. The refactor is unsafe before it runs.
Refactoring means structure changes, behavior doesn't
6 minFowler's definition: 'a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.' Drop either half and it stops being refactoring — it's a rewrite or a no-op.
Five questions to ask before any AI refactor
7 minBefore prompting the AI to change anything, you need answers to five questions about the legacy code: what does it do, what tests exist, who calls it, what's the riskiest edge case, and what's the smallest safe step. If you can't answer them, the AI definitely can't.
Explore → characterize → small edit → review
7 minSafe AI-assisted refactoring is a four-step loop you repeat. Each pass is small, tested, and reviewable. The loop is what turns AI from a risky shortcut into a careful collaborator — the discipline is in the loop, not the prompt.
Phase 2Summarize and Characterize Before Editing
Summarize the module, then characterize before editing
Prompt the AI to summarize, not edit
8 minYour first AI prompt against legacy code should produce zero edits. It should produce a plain-English summary: what the function does, who calls it, what edge cases it handles, what looks suspicious. Edits come after the summary is correct.
Tests that lock behavior, not specify it
9 minCharacterization tests are different from regular unit tests: their job is to pin down whatever the code currently does — including the bugs — so that any refactor that changes the result fails loudly. They lock behavior; they don't specify it.
Ask AI to map what's tested vs what's not
8 minBefore refactoring, you need a coverage map: which behaviors of the function are tested, which are characterized, and which are bare. The AI can build this map from the code and tests faster than you can — and the bare cells are where you must not refactor blind.
Find every caller and their assumptions
8 minA refactor that's safe inside a function can still break its callers — they may rely on the return type, the exception thrown, the order of side effects, the exact null returned. AI can grep the codebase for callers; humans extract the assumption per caller.
Plan the smallest reversible transform
8 minOnce you have the summary, characterization tests, coverage map, and caller assumptions, you plan the refactor as a sequence of small named transforms. Each transform is one prompt, one diff, one test run. The smallest unit that can be reverted independently is the unit of work.
Phase 3Pattern Catalog: Extract, Replace, Swap
Pattern catalog: extract, replace, swap with AI help
Extract function: the AI's strongest move
8 minExtract Function is the most mechanical refactoring pattern — pull a coherent block of code into its own named function. AI excels at it because the transform is local and the contract is clear: same inputs, same outputs, same side effects.
Replace nested conditionals with named guards
8 minDeeply nested if/else trees are unreadable. The classic refactor is to flatten them with early-return guard clauses or polymorphism. AI is good at the mechanical conversion but loves to also 'simplify' the branch conditions — which is where behavior changes sneak in.
Swap data structure — the riskiest pattern
9 minReplacing an array with a map, a list with a set, or an enum with a discriminated union is a high-leverage refactor that AI can scaffold well. But every caller may rely on iteration order, duplicate handling, or specific lookup semantics — and changing the underlying structure changes those guarantees silently.
Pick the smallest pattern that gets you 80%
7 minYou don't refactor a legacy function by applying every pattern at once. You apply the smallest sequence of patterns that gets the function to 80% of the target shape, then stop. The remaining 20% is rarely worth the risk.
Phase 4Refactor 200 Lines With Proof of Preservation
Refactor a 200-line function with proof of preservation
Refactor a 200-line legacy function with proof of preservation
15 minThe capstone runs the full loop end-to-end on a real legacy function: summarize, characterize, plan, execute transforms one at a time, prove preservation. The deliverable isn't just the refactored code — it's the audit trail showing each transform left behavior identical.
Frequently asked questions
- Can I use AI to refactor legacy code safely without breaking behavior?
- This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- What characterization tests should I generate before letting AI refactor a legacy module?
- This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- Which refactoring patterns does AI handle well vs poorly on legacy code?
- This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How do I prompt Claude or ChatGPT to refactor without inventing new behavior?
- This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
- How do I review an AI-generated refactor to prove it preserves behavior?
- This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Related paths
🐍Python Decorators Introduction
Build one mental model for Python decorators that covers closures, argument passing, functools.wraps, and stacking — then ship a working caching or logging decorator from scratch in under 30 lines.
🦀Rust Lifetimes Explained
Stop reading `'a` as line noise and start reading it as scope arithmetic — one failing snippet at a time — until you can thread lifetimes through a small parser or iterator adapter without fighting the borrow checker.
☸️Kubernetes Core Concepts
Stop drowning in 30+ resource types. Build the mental model one primitive at a time -- pods, deployments, services, ingress, config -- then deploy a real app with rolling updates and health checks.
📈Big O Intuition
Stop treating Big O as math you memorized for an interview — build the intuition to spot O(n²) disasters, pick the right data structure without thinking, and rewrite a slow function from O(n²) to O(n) in under five minutes.