Back to library

🧹Use AI to Refactor Legacy Code

Stop shipping AI-refactored legacy code that subtly breaks behavior. By the end you'll take a 200-line legacy function through explore → characterize → refactor → review and produce a version with provable behavior preservation — using AI on the careful steps, not as a shortcut around them.

Applied14 drops~2-week path · 5–8 min/daytechnology

Phase 1Why 'Refactor This' Is The Most Dangerous Prompt

Why 'refactor this' is the most dangerous prompt

4 drops
  1. 'Refactor this' tells the AI to invent a goal

    7 min

    Legacy code's behavior is the spec. When you prompt 'refactor this' with no constraints, the AI infers what 'better' means — which means it's free to change behavior it considers ugly. The refactor is unsafe before it runs.

  2. Refactoring means structure changes, behavior doesn't

    6 min

    Fowler's definition: 'a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.' Drop either half and it stops being refactoring — it's a rewrite or a no-op.

  3. Five questions to ask before any AI refactor

    7 min

    Before prompting the AI to change anything, you need answers to five questions about the legacy code: what does it do, what tests exist, who calls it, what's the riskiest edge case, and what's the smallest safe step. If you can't answer them, the AI definitely can't.

  4. Explore → characterize → small edit → review

    7 min

    Safe AI-assisted refactoring is a four-step loop you repeat. Each pass is small, tested, and reviewable. The loop is what turns AI from a risky shortcut into a careful collaborator — the discipline is in the loop, not the prompt.

Phase 2Summarize and Characterize Before Editing

Summarize the module, then characterize before editing

5 drops
  1. Prompt the AI to summarize, not edit

    8 min

    Your first AI prompt against legacy code should produce zero edits. It should produce a plain-English summary: what the function does, who calls it, what edge cases it handles, what looks suspicious. Edits come after the summary is correct.

  2. Tests that lock behavior, not specify it

    9 min

    Characterization tests are different from regular unit tests: their job is to pin down whatever the code currently does — including the bugs — so that any refactor that changes the result fails loudly. They lock behavior; they don't specify it.

  3. Ask AI to map what's tested vs what's not

    8 min

    Before refactoring, you need a coverage map: which behaviors of the function are tested, which are characterized, and which are bare. The AI can build this map from the code and tests faster than you can — and the bare cells are where you must not refactor blind.

  4. Find every caller and their assumptions

    8 min

    A refactor that's safe inside a function can still break its callers — they may rely on the return type, the exception thrown, the order of side effects, the exact null returned. AI can grep the codebase for callers; humans extract the assumption per caller.

  5. Plan the smallest reversible transform

    8 min

    Once you have the summary, characterization tests, coverage map, and caller assumptions, you plan the refactor as a sequence of small named transforms. Each transform is one prompt, one diff, one test run. The smallest unit that can be reverted independently is the unit of work.

Phase 3Pattern Catalog: Extract, Replace, Swap

Pattern catalog: extract, replace, swap with AI help

4 drops
  1. Extract function: the AI's strongest move

    8 min

    Extract Function is the most mechanical refactoring pattern — pull a coherent block of code into its own named function. AI excels at it because the transform is local and the contract is clear: same inputs, same outputs, same side effects.

  2. Replace nested conditionals with named guards

    8 min

    Deeply nested if/else trees are unreadable. The classic refactor is to flatten them with early-return guard clauses or polymorphism. AI is good at the mechanical conversion but loves to also 'simplify' the branch conditions — which is where behavior changes sneak in.

  3. Swap data structure — the riskiest pattern

    9 min

    Replacing an array with a map, a list with a set, or an enum with a discriminated union is a high-leverage refactor that AI can scaffold well. But every caller may rely on iteration order, duplicate handling, or specific lookup semantics — and changing the underlying structure changes those guarantees silently.

  4. Pick the smallest pattern that gets you 80%

    7 min

    You don't refactor a legacy function by applying every pattern at once. You apply the smallest sequence of patterns that gets the function to 80% of the target shape, then stop. The remaining 20% is rarely worth the risk.

Phase 4Refactor 200 Lines With Proof of Preservation

Refactor a 200-line function with proof of preservation

1 drop
  1. Refactor a 200-line legacy function with proof of preservation

    15 min

    The capstone runs the full loop end-to-end on a real legacy function: summarize, characterize, plan, execute transforms one at a time, prove preservation. The deliverable isn't just the refactored code — it's the audit trail showing each transform left behavior identical.

Frequently asked questions

Can I use AI to refactor legacy code safely without breaking behavior?
This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What characterization tests should I generate before letting AI refactor a legacy module?
This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
Which refactoring patterns does AI handle well vs poorly on legacy code?
This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I prompt Claude or ChatGPT to refactor without inventing new behavior?
This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do I review an AI-generated refactor to prove it preserves behavior?
This is covered in the “Use AI to Refactor Legacy Code” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.