Inheriting Legacy Code with AI: What to Do with a 4000-Line Function

🌏 閱讀中文版本


Table of Contents

The Real Fear

Open your IDE.

File name: OrderProcessor.java

Scroll to the bottom. Takes 5 seconds.

Cursor on line 1. Bottom right shows: 4,127 lines

Search for if. 847 results.

This function handles all order logic. Runs 100,000 times a day. One wrong line, and the entire order system explodes.

No comments. No documentation. Original author left three years ago.

Your task: “Just update the discount logic.”

Sounds simple.

But you can’t even find where the discount logic is.

How many times have you lived this?


Why Legacy Code Is So Hard

Surface Problems vs Real Problems

What you think the problem is What the problem actually is
Code is too long Don’t know which parts matter, which are historical garbage
No comments Don’t know “why” it was written this way
Original author left No one can answer “will this break if I change it”
Can’t understand it No confidence, afraid to touch it

Three Layers of Fear

Layer 1: Cognitive Overload

4000 lines of code. You need to simultaneously track: – What this section does – Where variables come from – What it affects – Edge cases to watch for

Human working memory holds about 7 items. 4000 lines far exceeds capacity.

Layer 2: Uncertainty

You change line 2847.

What does it affect? Don’t know. Are there other places depending on this? Don’t know. Are there tests to verify? No.

Uncertainty amplifies fear.

Layer 3: Responsibility Pressure

If it breaks, it’s on you.

Even though you didn’t write this code. Even though this code was always a minefield. When things go wrong, everyone asks: “Who changed it?”

These three layers of fear stack up to one result:

Don’t touch it.

If you can avoid changing it, avoid it. If you can work around it, work around it. If you can delay it, delay it.

Then the code gets worse, and the next person is even more afraid to touch it.

Vicious cycle.


What AI Can Help With: A Decision Framework

Not everything should go to AI. Not everything should be done by you.

✅ AI Takes Over Completely

Explaining what code does – Translation machine: turn 4000 lines into 20 paragraph summaries – You don’t read line by line—read summaries to grasp the whole picture

Finding dependencies – Which DB tables does this function read – Which external APIs does it call – What gets affected if you change it

Finding potential issues – Null pointer risks – Performance bottlenecks – Security vulnerabilities

Generating test cases – Happy path – Edge cases – Error handling

Suggesting how to split it – Which functions to extract – Which one is safest to start with

These have “standard answers.” AI is faster and more thorough than humans.

🤝 Human-AI Collaboration

Confirming AI’s understanding is correct

AI will guess wrong on business logic.

It doesn’t know “why this multiplies by 1.05.” It doesn’t know “this workaround handles a special requirement from a major client.”

You need to verify AI’s explanations and add context it doesn’t have.

Deciding refactoring priorities

AI says you can split into 7 functions. Which first? Which later? Which can’t be touched?

This requires your judgment on business priorities and risk.

Reviewing test cases

AI-generated tests may miss critical scenarios. You review, supplement, and confirm coverage.

❌ Human Decides

Whether to keep or delete logic

Code that looks useless might handle an edge case. Delete or not—that’s a business decision, not a technical one.

When to refactor

Now or next sprint? Is this risk worth taking? Does the team have bandwidth?

Communicating with PM/management

Why spend time refactoring? What benefit to users? What happens if we don’t?

AI can’t help with these.


Step 1: Build a Cognitive Map

Before changing anything, understand what this code does.

Let AI Be Your Translator

Don’t dump 4000 lines at once. AI will crash, and you’ll get garbage.

Feed it in sections.

First round—give the overview:

This is an e-commerce system's processOrder function.
4000 lines total. I'll give it to you in sections.

Here's the structure:
- Lines 1-400: Order validation
- Lines 401-800: Inventory check
- Lines 801-1200: Shipping calculation
- Lines 1201-1800: Discount processing
- Lines 1801-2500: Payment processing
- Lines 2501-3200: Create shipment
- Lines 3201-4000: Send notifications

Understand this structure first.

Second round—dive into one section:

Now I'll give you lines 1-400 (order validation).

Tell me:
1. Main logic in one sentence
2. List 3-5 key steps
3. Potential issues
4. How this connects to the inventory check that follows

AI will respond:

This section validates orders before processing.

Key steps:
1. Check user login status
2. Verify cart is not empty
3. Check if products are still available
4. Validate shipping address format
5. Check for incomplete pending orders

Potential issues:
- Line 45: user.getCart() has no null check
- Line 120: Address validation uses hardcoded regex
- Line 380: Magic number 1.05, purpose unknown

Connection:
After validation passes, validatedOrder object passes to inventory check.

Repeat this process. You’ll get summaries for 20 sections.

Now you have a “table of contents.”

Let AI Draw a Dependency Map

Analyze this function and list:

1. Which database tables it reads
2. Which external APIs it calls
3. Which global variables or configs it uses
4. Which states it modifies (DB, cache, files)

Why does this matter?

Knowing what it touches tells you what changes will affect.

This is your “blast radius assessment.”

Let AI Find the Landmines

Check this code for:

1. Potential bugs (null pointers, edge cases, race conditions)
2. Performance issues (N+1 queries, redundant calculations, massive loops)
3. Security risks (SQL injection, missing permission checks)
4. Maintainability issues (magic numbers, duplicate logic, deep nesting)

This isn’t about fixing now—it’s about knowing where the landmines are.

Know the landmine locations before you step on them.


Step 2: Write Tests Before Changing Anything

Core Principle: Refactoring Without Tests Is Suicide

You changed line 2847 out of 4000.

How do you know it didn’t break anything?

Run it and see? Manually test a few scenarios? Pray?

This isn’t engineering. This is gambling.

Three Testing Strategies

Strategy A: Golden Master Testing (Most Conservative)

Concept: Record current behavior first, compare after refactoring.

Steps:
1. Prepare 50 sets of real input data (pull from production logs)
2. Run current code once, record all outputs
3. This is your "golden master"
4. After refactoring, run again, compare if outputs match exactly
5. Any difference means you broke something

Ask AI to generate test data:

Based on this function's parameters,
generate 20 test inputs covering different scenarios:

- 5 normal cases (typical orders)
- 5 edge cases (empty cart, single item, many items)
- 5 special cases (discount codes, international shipping, pre-orders)
- 5 error cases (invalid address, out of stock, payment failure)

Strategy B: Characterization Testing (Describe Current State)

Concept: Regardless of whether code is correct, turn “current behavior” into tests.

This function's current behavior:

- Input X returns Y
- Input A throws exception B
- Input null returns empty array

Write these behaviors as tests.
Doesn't matter if they're bugs—lock in current state first.

Why?

Refactoring goal is “behavior unchanged.” Lock in current state, then discuss whether to change behavior.

Strategy C: Critical Path Testing (Pragmatic)

If there’s no time for comprehensive tests:

What are the 5 most important use cases for this function?
Write tests only for these 5.

80/20 rule: 20% of tests cover 80% of risk.

Something is better than nothing.

Pitfalls of AI-Generated Tests

AI-generated tests have common problems:

Problem 1: Only tests happy path

AI generates “normal flow” tests but misses edge cases.

Problem 2: Too much mocking

AI mocks all external dependencies, so tests only test “mock behavior,” not real integration.

Problem 3: Weak assertions

// AI-generated test
assertNotNull(result);  // Only confirms not null, doesn't verify content

What you need to do: Review AI’s tests

Ask AI:

Did the tests you just generated miss any important scenarios?

Especially:
- Empty data cases
- Large data volume cases
- Concurrent execution cases
- Insufficient permission cases
- External service timeout cases

Let AI check itself.


Step 3: Splitting Strategy

Why Did It Become 4000 Lines

A 4000-line function didn’t happen overnight.

Cause 1: Feature stacking

Every new requirement gets stuffed in.

“Just add feature A here.” “And add check B there too.”

Three years later, 4000 lines.

Cause 2: Copy-paste

Similar logic copied 10 times.

“This is similar to that, just copy and modify.”

Each copy doubles the line count.

Cause 3: Fear-driven

Afraid to create new functions, worried about breaking existing calls.

“Just stuff it here, it works anyway.”

Cause 4: No code review

No one said “this is too long.” No one said “should be split.” No one cared about code quality.

Understanding causes tells you how to split.

Let AI Suggest Splitting Approach

This 4000-line function needs splitting.

Consider:
1. Single responsibility principle (one function does one thing)
2. Testability (each split should be independently testable)
3. Minimize change scope (minimize impact on callers)

Tell me:
1. Which independent functions can it split into
2. What's each function's responsibility
3. How do they call each other
4. Which is most independent, safest to extract first

AI might suggest:

Suggest splitting into 7 functions:

1. validateOrder() - Order validation (lines 1-400)
   Responsibility: Ensure order data complete, user authorized
   Dependencies: Lowest, only needs Order object
   Suggested order: Extract first

2. checkInventory() - Inventory check (lines 401-800)
   Responsibility: Confirm all products have sufficient stock
   Dependencies: Needs validateOrder result

3. calculateShipping() - Shipping calculation (lines 801-1200)
   Responsibility: Calculate shipping based on address and products
   Dependencies: Needs validated address data

...

Three Principles for Splitting

Principle 1: Extract the most independent part first

Which section has the least dependencies on others? Extract that first.

Ask AI:

Of these 7 suggested functions,
which has the lowest dependencies on other parts?
I want to start with the safest one.

Usually: – Pure calculation logic (no DB reads, no API calls) – Validation logic (only checks, doesn’t modify) – Notification logic (last step, doesn’t affect main flow)

Principle 2: Gradual replacement (Strangler Fig Pattern)

Don’t change everything at once.

Steps:
1. Create a new validateOrder() function
2. Copy lines 1-400 logic to it
3. In the original location, change to call validateOrder()
4. Run tests, confirm behavior matches
5. Commit
6. Next sprint, extract the next section

Each time, only change one small piece. Each time, you can rollback. Each time, tests protect you.

Principle 3: Keep function signature unchanged

First phase of refactoring: only change internal structure, not inputs/outputs.

Why? – Callers don’t need changes – Tests don’t need changes – Minimum risk

Once internal structure stabilizes, then consider adjusting the interface.


Step 4: Handling “Only the Person Who Left Knows” Logic

Black Holes Even AI Can’t Understand

Some code, AI can’t understand either:

// Don't know why it's like this, but removing it breaks things
price = price * 1.05 * 0.97 * 1.02;

Ask AI, it’ll say:

“This appears to adjust the price, but I don’t know why these specific numbers. Could be tax rates, discounts, or other business rules.”

This is a business logic black hole.

AI can analyze code structure, but it doesn’t know: – Which client’s special requirement this was – Which historical bug this works around – Who added this workaround at 3 AM three years ago

Archaeology Strategies

Strategy 1: Git Blame

git blame -L 2847,2847 OrderProcessor.java

Find who added this line and when.

git show abc123  # See the full commit
git log --grep="price adjustment"  # Search related commit messages

If you’re lucky, the commit message explains why. Even luckier, there’s an issue or ticket number.

Strategy 2: Search Related Documentation

Ask AI:

This code mentions 'specialDiscountRate'.

Search the project for related:
1. Config files (application.yml, config.json)
2. Documentation (README, wiki, confluence)
3. Test cases (might have comments explaining purpose)
4. Comments in other code

Strategy 3: Find Senior Employees

Not to explain the code—to explain the history.

“That big client project in 2021, was there any special pricing logic?”

“This 1.05 number, do you remember what it’s for?”

They might not remember the code, but might remember the business context.

Strategy 4: Mark and Isolate

If you really can’t find the answer:

/**
 * WARNING: Unknown pricing adjustment logic
 *
 * Possibly related to 2021 client project (unconfirmed)
 * Original author John has left, cannot ask
 *
 * Do not modify until confirmed
 * If modification needed, confirm business rules with PM first
 *
 * Discovered: 2025-12-08
 * Discovered by: Tim
 * Related ticket: Not found
 */
price = price * 1.05 * 0.97 * 1.02;

At least the next person knows: – This is a “known unknown” – Not your oversight – Be careful before touching it


Mindset: This Is Not Your Problem

Legacy Code Is Historical Debt

You didn’t write these 4000 lines.

They were written this way for reasons: – Tight deadlines – Constantly changing requirements – No one reviewed – No time to refactor

Your responsibility is to “make it a little better,” not “make it perfect.”

Three Refactoring Traps

Trap 1: Wanting to refactor everything at once

“I’ll spend two weeks completely rewriting this!”

Result: – Two weeks later, not done – New requirements come in, need to work on old code – Your refactoring branch is 500 commits behind main – Never gets merged

Correct approach: Each time you go in, change one small piece, merge immediately.

Trap 2: Pursuing “clean”

You spend three days making code beautiful.

PM asks: “What does the user experience differently?”

You: “Uh… nothing, but the code is cleaner.”

PM: “…”

Correct approach: Refactoring follows requirements.

When you need to change that functionality, clean up that code. This way refactoring is “part of the requirement,” not “extra work.”

Trap 3: Refactoring becomes rewriting

“This code is too bad, rewriting would be faster.”

90% of the time, this is wrong.

Rewriting means: – Losing all edge case handling (you don’t know what they are) – Losing all bug fixes (you don’t know what was fixed) – Stepping in all the same pits again (predecessors already stepped in them)

The temptation to rewrite is strong, but the cost is higher.

Boy Scout Rule

“Leave the campground cleaner than you found it.”

Every time you change code: – Rename one variable to be clearer – Extract one small function to clarify logic – Add one comment so the next person guesses less

Not big refactoring. Just doing it incidentally.

One year later, 4000 lines becomes 3000. Not from one refactoring session, but daily small improvements.


Practical Operation: Complete Claude Code Conversation Flow

Opening Move: Give AI Enough Context

Wrong approach: Dump 4000 lines directly

You: Help me explain this code
[paste 4000 lines]

AI will crash or give you garbage.

Correct approach: Feed in layers

You: I'm inheriting a legacy project.
     There's a processOrder function, 4000 lines.
     I'll give it to you in sections.

     Background:
     - This is the e-commerce system's order processing core
     - Runs 100,000 times daily
     - Original author left
     - No tests, no documentation

     My goals:
     1. First understand what it does
     2. Find risk points
     3. Plan how to split and refactor

     Ready?

Five Most Useful Prompt Templates

Template 1: Code Translation

Explain what this code does in one sentence.
Then list 3-5 main steps using bullet points.
Don't explain syntax, focus on business logic.

Template 2: Dependency Analysis

Analyze this function:

1. Input: What parameters needed, where from
2. Output: What it returns, what state it changes
3. Dependencies: Which DB tables read/written, which external services called
4. Side effects: Sends email? Writes logs? Changes global variables?

Template 3: Risk Assessment

If I want to modify the logic on line 2847:

1. What other places might it affect?
2. What edge cases should I watch for?
3. What tests do you suggest I write first to protect this change?

Template 4: Test Case Generation

Design test cases for this function:

1. Happy path: 3 normal flows
2. Edge cases: 3 boundary conditions
3. Error cases: 3 error handling scenarios

For each case: input, expected output, test purpose

Template 5: Refactoring Suggestions

This function is too long. Suggest how to split it:

1. What independent functions can it become?
2. What's each function's responsibility?
3. How do they call each other?
4. Which is most independent, safest to extract first?

Complete Conversation Example

You: I'm inheriting a legacy project with a 4000-line processOrder function.
     I'll give you the first 200 lines. Help me understand.
     [paste code]

AI: This section is order validation logic, doing five things:
    1. Check user login status
    2. Verify cart not empty
    3. Check products still available
    4. Validate shipping address format
    5. Check for incomplete pending orders

    Potential issues:
    - Line 45 null check might miss edge case
    - Line 120 address validation uses hardcoded regex

You: What specifically is the issue with line 45's null check?

AI: Line 45 is user.getCart().getItems().
    If getCart() returns null, it throws NullPointerException.
    Suggest adding null check or using Optional.

You: Good, I want to address this section first. Write test cases.

AI: [generates 5 test cases]

You: Do these tests cover the line 45 issue you mentioned?

AI: No, let me add a test case for when cart is null...

You: Now here's lines 201-400...

Key point: Conversation is iterative, not one-shot.

What to Do When AI Is Wrong

AI will guess wrong, especially:

  • Business logic: It doesn’t know your company’s rules
  • Misleading names: Function called validate but actually does transform
  • Historical workarounds: Looks like a bug, but it’s intentional

How to handle:

You: You said this section does X, but when I actually run it, it does Y.
     What could cause this?
     Please re-analyze, considering [your observation].

Let AI correct itself rather than switching to a different question.


Team Collaboration: How to Lead Juniors Through Legacy Code

Why Juniors Get Stuck with Legacy Code

They’re stuck not because they’re lazy or dumb.

It’s because: – Not enough context: Don’t know the system overview, don’t know why it’s designed this way – No confidence: Afraid of breaking things, afraid of getting blamed, afraid of asking stupid questions – No methodology: Don’t know where to start, don’t know how to break down the problem

Pair Programming with AI

New pair programming mode:

Senior + Junior + AI (trio)

Flow:
1. Senior gives context: "This function is order core, affects revenue"
2. Junior talks to AI, Senior listens
3. AI explains code, Junior asks questions
4. Senior intervenes at key moments: "AI is wrong here, actually it's..."
5. Junior learns not just code, but Senior's judgment process

Benefits:

  • Senior doesn’t need to explain basic concepts constantly (AI handles it)
  • Senior can focus on “what AI doesn’t know” (business logic, historical context)
  • Junior has AI as safety net, comfortable asking questions
  • Junior sees how Senior collaborates with AI, learns methodology

Build a Legacy Code Knowledge Base

After each legacy code session, leave records:

## OrderProcessor.java Exploration Log

### 2025-12-08 by Tim
- Understood lines 1-400 (order validation)
- Found potential null pointer at line 45
- Added 3 test cases
- Unsolved mystery: What's the magic number 1.05 on line 380?

### 2025-12-15 by Amy
- Asked senior John, 1.05 is special tax rate from 2021 client
- Added comment explaining it
- Extracted validateOrder() function, 400 lines → 80 lines

### 2025-12-22 by Tim
- Processed lines 401-800 (inventory check)
- Found N+1 query issue, not fixing this time, documented

This way knowledge isn’t just in one person’s head.

Next person who comes in reads this log first, knows what predecessors did and discovered.

New Code Review Standards

Old review standard: “Is this code correct?”

New review standard: “Does this code make the system easier to maintain?”

New review checklist:


  • Are there tests? (at least happy path)

  • Is readability improved? (variable naming, extracted functions)

  • Are there comments? (especially for non-intuitive logic)

  • Is change scope minimized? (don’t sneak in big refactors)

  • If legacy code, is the knowledge base updated?

Managing Up: How to Convince Your Manager to Give You Time

Managers Don’t Care About “Dirty Code”

You say: “This function is 4000 lines, maintainability is poor, needs refactoring.”

Manager hears: “You want to spend time on something that won’t produce new features.”

You need to translate into manager language.

Three Persuasion Frameworks

Framework 1: Risk Language

This function handles all orders, runs 100,000 times daily.
Currently no tests, no docs, original author left.

Last incident took 12 hours to locate the bug.
If something happens on weekend, no one can handle it quickly.

Suggest investing 3 days for basic cleanup:
- Add tests for critical paths
- Write a structure documentation
- Mark known risk points

This way next incident, resolution time drops from 12 hours to 2 hours.

Framework 2: Efficiency Language

Last time changing order discount logic, estimated 2 days, actually took 5 days.

3 days spent on:
- Understanding 4000 lines (1.5 days)
- Manual testing to ensure nothing broke (1 day)
- Fixing issues hit during changes (0.5 days)

Next three months have 5 requirements touching this area.

If we spend 3 days cleaning up:
- Each requirement saves 2 days
- Three months saves 10 days
- ROI: 3 days investment, 10 days return

Framework 3: Talent Language

Amy joined the team 1 month ago.
She still won't touch the order module because code is too complex, no documentation.

In contrast, user module has tests and docs, she could work independently by week 2.

If we don't clean up order module:
- Only I can ever change this area
- Amy's growth is limited
- I can never take vacation (no one can backup)

Don’t Ask for Too Much Time at Once

Wrong: “I need two weeks to focus on refactoring this.”

Manager will say: “When we have time.”

Then there’s never time.

Right: “Each sprint I spend half a day cleaning one small piece, following requirements.”

Manager easily agrees, and you actually do it.

Show Progress

After each cleanup, send a brief update:

This week cleaned up processOrder's order validation section:

- Extracted validateOrder(), 350 lines → 80 lines
- Added 5 test cases
- Next time changing this area, estimated 2 hours saved

Cumulative progress: 4000 lines, 400 cleaned (10%)

Let manager see: – What you’re doing – Concrete results – Benefits to team


Quantifying Technical Debt: Explaining to Non-Technical People

Analogy: Hidden Costs of a House

Imagine you bought a 30-year-old house.

Looks livable, but:
- Old wiring, occasional power trips
- Rusted pipes, unstable water pressure
- No blueprints, every renovation requires guessing where pipes are

You can keep living there, but:
- Every problem costs more to fix
- Contractors are afraid to touch it, worried about cascading effects
- One day the whole system might collapse

Technical debt is like this.

System runs, but maintenance costs keep rising.
Every change brings bigger risks.

Three Quantifiable Metrics

Metric 1: Time to Fix

Last order system bug:

- Locating problem: 8 hours (searching through 4000 lines)
- Fixing problem: 2 hours
- Testing verification: 2 hours (manual, no automated tests)

Total: 12 hours

With tests and documentation, estimated only 4 hours needed.

Metric 2: Cost of Change

Last time changing order discount logic:

- Requirement itself: change 10 lines of code
- Actual time spent: 5 days

Where time went:
- Understanding code: 2 days
- Manual testing: 1 day
- Fixing issues encountered: 1 day
- Actual changes: 0.5 days
- Code review + deployment: 0.5 days

Modules with tests, same requirement takes only 1 day.

Metric 3: Onboarding Time

Amy joined team 1 month ago:

- User module: Independent by week 2 (has tests, has docs)
- Payment module: Independent by week 3 (has tests, lacks docs)
- Order module: 1 month in, still won't touch (no tests, no docs, too complex)

Every new hire steps in the same pits.

Visual Presentation

Draw a simple chart for your manager:

Module Health Dashboard

User Module    ████████░░ 80%  ✓ Has tests, has docs
Payment Module ██████░░░░ 60%  △ Has tests, lacks docs
Order Module   ██░░░░░░░░ 20%  ✗ No tests, no docs ← Highest risk

Manager sees the problem at a glance.

Build a Technical Debt List

Don’t just complain “code is bad.” Build a concrete list:

Item Risk Level Estimated Cleanup Time Benefit After Cleanup
processOrder no tests High 3 days Fix time -8 hours/incident
User module no docs Medium 1 day Onboarding -1 week
Payment API no error handling High 2 days Reduce complaints, lower refund rate
Reports module slow Low 5 days Query time 30s → 3s

With a list, you can discuss priorities and put it in the roadmap.


Complete Process Summary

Week 1: Build Cognition

Goal: Know what this code does and its risks

  1. Have AI explain each section → Output: Section summaries
  2. Have AI draw dependency map → Output: Dependency map
  3. Have AI find landmines → Output: Risk list

Deliverable: A “system map” document

Week 2: Build Safety Net

Goal: Have enough tests before daring to change

  1. Generate Golden Master test data
  2. Have AI generate tests, you review and supplement
  3. Ensure critical paths have test coverage

Deliverable: Test coverage from 0% → 30%+ (critical paths)

Week 3+: Gradual Improvement

Goal: Each time in, make it a little better

  1. Start with most independent section
  2. Each time change only one small piece, merge immediately
  3. Mark logic you can’t understand, don’t force changes
  4. Update knowledge base for the next person

Deliverable: Each sprint reduces 5-10% of code lines


From Fear to Control

Before, opening 4000 lines:

  • Brain freezes
  • Don’t know where to start
  • Afraid to change anything
  • Want to quit

Now:

  • AI translates, you read summaries
  • AI maps dependencies, you know boundaries
  • AI writes tests, you have safety net
  • AI suggests splits, you decide order

Those 4000 lines are still 4000 lines.

But you’re not afraid anymore.

Because you have a method. Because you have tools. Because you know this isn’t your problem—it’s historical debt.

Your job is to make it a little better, bit by bit.

Not heroic big refactoring. Daily small Boy Scout Rule improvements.

One year later, 4000 lines becomes 2000. Has tests, has documentation, people dare to change it.

That’s your accomplishment.

Leave a Comment