The Big Redesign In The Sky 29
The first rule of holes: If you are in one, stop digging.
Many software developers take this to mean that if you have a huge legacy mess in your software you should stop working on it and rewrite it from the ground up.
This is probably the worst thing you could do…
Everyone wants to work in a green field. In a green field you don’t have to deal with the mess that’s accumulated over the years. In a green field you can be clean. In a green field you can build the perfect system. All would be well if only we could start over with a green field.
What a crock. Remember, the original mess started in a green field. All messes started in a green field! The one irrefutable data point we have about green fields is that they frequently lead to horrible messes!
Why do green fields become messes? Because the cows crap all over the place. The grass tastes really good because it’s virgin clean and fresh. And who cares about the occasional cow pie? Besides as long as we walk in the same direction, all the cow pies are behind us and we can’t see them.
OK, I’ve taken the metaphor too far. But the reality of green field projects is that they create the illusion that your messes don’t matter.
Your messes do matter. Every single one of your messes matters. And if you don’t clean them up from the very start, you are going to wind up with a horrible messy legacy wad in short order.
But that’s not what this blog is about. This blog is about what you are supposed to do when you actually have a big ugly legacy wad. So Uncle Bob is going to tell you what to do.
You aren’t going to like it.
When you have a big messy legacy wad, what you have to do is…
You really aren’t going to like this.
What you have to do is… is…
42.
No, sorry, wrong story. What you have to do is stop making the messes and start cleaning them up.
This does not mean you call your managers into a conference room and tell them that you aren’t going to be delivering features for the next three months while you refactor the code. Do NOT do this! Rather, it means that you are going to adopt the “Boy Scout Rule” and check each module in a little cleaner than when you checked it out.
From iteration to iteration, and from release to release, you are going to clean this system while continuing to add new features and functionality to it. There is no other way.
When is a redesign the right strategy?
I’m glad you asked that question. Here’s the answer. Never.
Look, you made the mess1, now clean it up.
I’m not telling you this to punish you for making the mess. I’m telling you this because the green field effort is almost always doomed to fail. Here’s the reason:
Once systems get messy, the people who made the messes start to demand a redesign. Managers don’t want to redesign because they know how expensive it is. But the developers beat the drums of redesign louder and louder. Meanwhile every feature takes longer and longer to add. Bugs accumulate faster than they can be fixed. Bugs that are fixed reappear over and over again. When managers ask why, the developers blame the mess, and beat the drums ever louder… ever louder.
Eventually, out of sheer frustration, management authorizes the redesign. They pick the Tiger Team! The best of the best… The cream of the cream… These are the developers who are going to save the project.
The rest of us hate them because they get to work on a green field while the rest of us are stuck maintaining the old system. And the feature requests and bug fixes continue to pile in.
What does the Tiger Team use for requirements? There may have been a requirements document at one point, but it was never actually accurate, and it’s hopelessly out of date by now! No, the Tiger Team has but one source for requirements. The old system!
But the old system is changing! Those of us not on the Tiger Team are adding new features and bug fixes every day. And that means that we are in a…
Race!
Remember Xeno’s Paradox? Achilles and a tortoise are in a race. The tortoise has a head start. Every time Achilles gets to where the tortoise was, the tortoise has moved to a new point ahead of Achilles. Therefore Achilles can never catch the tortoise.
Fortunately the law of limits allows us to escape this paradox in a mathematical sense. But in software the paradox sill holds. The Tiger Team can never catch up to the old system. Every time they get to where the old system was, the old system has added new features.
I have seen that race go on for ten years.
When (If!) the new system eventually gets to the place where it can replace the old system, all the members of the Tiger Team will be long gone having moved on to other wonderful green field projects. Schedule pressures will have mounted until the new system is finally completed2 through a Herculean effort. And the developers on the new system will have already started beating the drums for a redesign.
I have seen this happen over and over again. Big Redesigns in the Sky almost always fail horribly. Horribly! HORRIBLY!!
Moral
It’s simple really. But you aren’t going to like it. If you have a mess, the only way to get rid of the mess is to clean it. Remember that mess is currently paying your paycheck. That system represents your family jewels. You may have dragged those jewels through the mud, and dipped them in pig dung, but that’s no reason to abandon them for some shiny baubles that look pretty for the moment, but can’t possibly pay your paychecks. Go get the family jewels and clean them. It may take a long time. It may be hard. It may be disgusting work. But in the end you’ll have your family jewels shined and beautiful, and you’ll know how to keep them that way.
1 And if you didn’t, you’ve probably made other equivalent messes and left them behind for others to clean up. What goes around comes around.
2 “Completed” is the wrong word. More likely the system was finally readied through a set of horrible compromises with the product people and customers. Whole feature sets are likely missing, broken, or so different that the customer hates them.

...but Uncle Bob, what about applications that are written on an obsolete platform. Isn’t the only recourse a rewrite/port?
@Darrin If it ain’t broke…..
Joel Spolsky also wrote about this: http://www.joelonsoftware.com/articles/fog0000000069.html
So has Chad Fowler: http://chadfowler.com/the-big-rewrite
””“Here’s the answer. Never.”””
Bullshido.
This grossly overlooks the fact that sometimes it is cheaper for a company to rewrite the code than it is to refactor it. If your coding team is taking 30 hours every week to understand what some functionality is doing, but they can rewrite the same functionality from scratch in only 15, the business is wasting 15 hours worth of wages and employee benefits in irrecoverable overhead.
To extend your analogy with real-world examples, sometimes it’s necessary to till a field over and over again until the soil is rich enough for new grass. Here’s another example: after a house is severely destroyed by its tennants (e.g., a bunch of punk teenagers throwing a party big enough to crack drywall and puke all over the carpets—yes this has happened to my family’s rentals in the past), it’s often cheaper to just tear down the walls and put up new drywall (sure it’s cheaper to patch the cracks in the short-term, but it devalues the property in the long run), rip out and lay down new carpets instead of cleaning spots you know will never come out, and junk the furniture in favor of new amenities.
You’re looking at this from a purely coding point of view, which is actually THE WORST POSSIBLE point of view to take. A business needs to control its costs, and if it’s cheaper to rewrite than to renovate, then by all means, do the rewrite.
It’s about time employees start thinking like company owners; criminy, that stock options you vest is there for a reason. It is in YOUR best interest to control costs—nobody in the upper eschelons of the company can do this better than you.
“Bullshido.”
Samuel, why does it take the coding team 30 hours to understand the functionality in the existing system but it’ll only take them 15 hours to recreate in a new system? Are you implying there’s a requirements or design document that explains it to them? If so, then wouldn’t they use that in understanding the existing system?
Samuel,
You have sorely missed the point. It appears to be cheaper to rewrite but seldom if ever is. The code that will take 15 hours to rewrite will again take 30 hours to refactor, and you will have waisted that 15 hours and had to spend the 30 any way. That means it cost you 45 hours to fix a problem that would have taken you 30 hours initially.
I know (from experience) that the green field development can look more promising and appear to be cheaper. I have advised companies into doing it. Every time it has been the failure that Uncle Bob describes.
The reason is that green field development (when there is an honest reduction in hours) is a short term patch at best is it does not solve the long term problems that lead to that redevelopment being needed.
Also your analogy assumes that all work other then redevelopment is halted. Which is not what Uncle Bob was saying. He was saying that you do not take that 30 hours of refactoring as a task unto and of itself, but rather refactor as you add new features and fix bugs. In this way you add one or two hours to each change but not more then that. The business should not feel the pain of the refactoring. If you do green field development the company will always feel the loss of money, time and ability to hit the market.
In short Green field development never takes business needs into account and only takes coder needs into account. If you are honestly concerned with the business needs and cash flow you do everything you can to not interrupt that cash flow while increasing it (by reduction of cost in maintenance).
Great post, Uncle Bob! I started to think how to avoid this kind of situation. If team cannot see what is waiting them behind the corner (mess – speculations of redesign etc). What are you remedies for developers and teams to notice that “we are falling into real mess” instead they notice that “we are in mess and deep”?
How about incrementally rewriting the system, one subsystem at a time, so that the application keeps running all the time? For example you move from one obsolete platform to a new platform by temporarily having the program in two parts which cooperate through a network/database/whatever. New functionality is always written on the new platform, and gradually the existing functionality of the old system is replaced by the new system. During the transition period the users need to run both the old and the new systems, until the new one can completely replace the old one.
I think it’s wrong to say we should never redesign. Consider the problem I face. I work mainly on a system implemented with Java/JDBC, but occasionally I need to add features to the legacy code written using C/Oracle ProC. The problem is that the old cold is a lot less efficient. It fetches a new connection for each file it processes and it does not take advantage of Oracle batching features so the database inserts are slow. The best term I have heard for this situation is at http://www.martinfowler.com/bliki/TechnicalDebt.html. Martin refers to the problem as technical inflation, and in this case I think we would be better to rewrite this legacy code in Java. The problem is that the new programmers don’t understand C and Oracle ProC, the code is not exercised very often because the C code is hard to setup and test, and the processing takes more computing resources for each file it processes than its Java counterparts. The best time to work on this redesign is during the slack time that comes with new feature planning. The legacy code does not change at the same rate as the newer Java code, I would not expect to playing catch up with a changing target baseline. The longer the forces of technical inflation hurt us, the less competitive is our position and the more late nights and weekends we have to spend trying to work with code that is fragile, hard to test, and severely constrained by its aging platform.
I agree with uncle Bob in a way, but in my opinion it all comes down to system scalability/stability vs. resource/time and money, the team leads and the product team (stakeholders) need to agree on what is import and what should we focus on to continue with deploying new features at the same time working on defects, re-factoring, which may include redesigning the system if necessary.
And “IF” needs be and there is no way you can deliver/deploy features, which I doubt, unless you re-design the system, then it has to be redesigned in well tested pieces (blocks/modules/packages) that can be extended/maintained easily (less resources -> less money) without the need to re-design your recent design.
I have certainly been the guilty party for drum-banging redesign in my time. I really believed it was the right way forward.
My experience confirms Uncle Bob’s observations:There were many issues with this that Bob describes. There was moving target pressure from new features in the ‘mess’. There was also a ‘dont change’ pressure from needing to parse legacy data formats in disk files. Some files contained errors, and the ‘mess’ was cleverly written to overcome the problems. So some ‘clean’ algorithms just could never work. The exact requirements were impossible to reverse engineer.
There was an interesting comment above about a rewrite making business sense if it is cheaper to produce than the cost of maintaining the mess. One thing I have learned since 1982 is that projecting the cost of a software rewrite with any accuracy borders between very difficult to utterly impossible.
Given that the cost to rewrite cannot be known, this essentially reduces the business case to a gamble. But the other caveats Uncle Bob mentions still apply.
I would urge extreme reticence before deciding to ‘big-bang’ rewrite. It’s tempting, it’s personally satisfying, it seems like it is adding business value, and in theory it does.
But then, who do you think is going to be maintaining and extending the rewrite in your organisation, even supposing it is not canned (along with your credibility) before completion? It could be you – or the ‘mess maker’ team – or the latest graduate hire with limited experience – or some outsource company. The point is, these people will vary in how they approach your code from being ‘in line’ with your ideas … up to making ‘a worse mess’ in your opinion.
Our CTO has a very good take on this, and I’ve learned from him. Even the most intentional coding cannot explain all the motivations, rationale, and aspirations for a piece of software. You can see what was done, how it was done fairly well with clean code. But you can’t see why the alternatives were rejected. And that becomes a significant barrier to other maintainers no matter what you do.
As a result, I can only support this little ‘accept it; clean as you go’ philosophy. What goes around does indeed come around. Maybe you’ll even influence some mess makers along the way!
Not forgetting that there isn’t just one way to write good software … so one person’s ‘mess’ is another persons ‘excellent algorithm’ ...
It’s the forgotten requirements that are the biggest problem to actually being able to USE the software.
Everybody thinks that green field has all the requirements out there. Easy to see. But this is never true. And if you decide to rewrite rather than migrate, it will cause trouble.
Ironically, I’ve been hit by a much more subtle version of this. Not a rewrite, but development for many months on new stuff. No automated deploy, and every deploy having to be rolled back because of new bugs to existing functionality.
No one wanted to throw away the months of development. Including myself. But it made us unable to react to the business needs (can’t deploy). Finally we realized that we needed to get a automatic deploy of exactly what was in production (roll back the previous months) then move forward. Unfortunately, it took us 2 months to come to this conclusion.
2 months and 1 week later though, we were providing value to the users, something that hadn’t been done in the last 5 months.
Green field development needs to be at least as good (read same) as the previous before the investment pays off. so now it becomes a different question.
If i spend 30 hours this week that will be usable next week, is it cheaper than spending 15 hours that can’t be used till next year?
I strongly agree, and I’ve seen several stupid re-implementations myself. As far as “never” is concerned, it may be an acceptable approximation to the correct answer to the question, “when a redesign is the right strategy?” If you’re using a linear scale, that is. I’ll bet less than 5% of suggested redesigns are worthwhile, probably less than 1%, possibly less than .1%.
But it’s easy to find counterexamples on a small scale. It does happen sometimes that there is a new technology or design that cuts the amount of code required by an order of magnitude or more. In the days when Java was unstable and kept crashing frequently, I had responsibility for a program that monitored instances of a Java program and restarted them when they crashed. Once, in a flash of insight, I realized that it could be re-implemented in three lines of shell script: just start the program inside an eternal while loop.
I could have thought of that before. Or someone else could have. But we didn’t.
I hear what you’re saying, and I do agree that a rewrite is not needed in 99% of the time – developers have an urge to write things from the ground up instead of maintaining code.
But, I don’t think that a rewrite should NEVER be done. I’m currently involved in one of those new-SOA-rewrite-of-whole-business-which-would-make-us-much-cooler projects. Leaving aside the whole ‘SOA is dead stuff’, I think the need is quite obvious. We are working on an in-house process server and have hundreds of processes written in an in-house language (don’t ask). That language is, of course, weak, obscure, unknown to most of the staff and is the spawn of the devil. Do you really think that developers should keep learning that language for writing new processes and maintaining old ones instead of introducing a new process server and migrating those processes that are still maintained?
Agree technically, but in term of finance, it would be tough to ask for funding if don’t rewrite something :-)
Uncle Bob,
I’m going to have to tentatively suggest that your conclusion of “Here’s the answer. Never.” Should in fact be “Almost never.”
Your logic that a race condition will always result is inaccurate becase it is possible to freeze a legacy system (or subsystem) long enough to perform a rewrite. However, a rewrite will still fail if the underlying causes of the original mess are still in existence.
e.g. if the cause was a poor conception of the system’s goals and this was the only cause of the mess, a rewrite will fail unless this underlying cause is fixed. Frequently this means that someone other than the person(s) who sponsored the original system’s development must be replaced.
Successful rewrites
A heuristic for how to perform a successful rewrite would be as follows:
Correction:
“Frequently this means that someone other than the person(s) who sponsored the original system’s development must be replaced.”
should read:
“Frequently this means that the person(s) who sponsored the original system’s development must be replaced.”
I completely agree with the point about tidying your mess as you go along but sometimes the requirements of a system shift over time in such a way that a complete redesign ends up being cheaper and more robust in the long run.
I agree with Uncle Bob that rewriting from scratch is not a good idea. I see some comments on funding that I don’t understand. From what uncle bob writes, you don’t have to ask for funding. You write new features and you clean up while you are doing it.
We’re about to do a rewrite of a large, enterprise application.
Its written in .NET 1.1. very poorly by a handful of developers 6 years ago who were learning asp.net and programming (apparently) for the first time. There are bugs everywhere. There is data access in the code behind (User interface.) There are almost no objects. The database is a denormalized mess. Data integration from the clients office is done using DTS which makes changes almost impossible to version control or track and many of those DTS packages are horrendous because they copy data all over the place. There is a giant god object that does some of the data access. There are mountains of duplicated C# code and duplicated stored procedures.
Those developers are gone. We use domain objects and interfaces now. We use NHibernate, Dependency Injection, and domain driven design. We use TDD. We use MVC for our UI. We use layered architecture.
The application hardly ever receives changes or new functionality, other than bug fixes. The client has agreed to freeze functionality on the old site until this one is done.
Does it not make sense to do a rewrite in this case? Does it really make sense to hobble this thing gradually,= into something sort of coherent of the next 3-5 years, rather than to rewite it over the next 6 months to a year?
Great article!
What about Mozilla? Following the old structure would have been a mess.
Then again, I believe, they still use some parts from the old Mozilla (XUL engine maybe), so its not entirely rewrite, though from architecture point of view, it is.
How about if other companies write another browser then? The thing about browser back then was that IE was quite stagnant, so you don’t have to play catch up in terms of features. If they don’t rewrite, Google might have released their own browser in time (which is almost from scratch).
I guess its depends on how much of the old stuff that you could reuse, which also means reducing uncertainty, since those codes have been field tested and all. Given that, you could rewrite incrementally, cleaning parts by parts, provided your components are isolated enough.
I guess that is the direction of Mozilla until now. Maybe someone can confirm this.
Fadzlan,
If you read Joel Spolsky’s article against rewrites (more from a business owner perspective) from some years ago….he uses Mozilla as his example of why not to do a rewrite. Look at all the market share they lost while Netscape went dark.
I complete agree, however, I would like to offer up a different approach to possibly solving the same problem.
One idea that has come out of ThoughtWorks is that off a Strangler Application as documented by Martin Fowler. Instead of rewriting the existing code, you develop new features using new code in such a way that you slowly strangle the legacy code. The new code is created using the underlying integration points and ideas, but is implemented using new code. This allows you to slowly replace parts of the new system without having the problem of parallel development.
I’ve personally been involved in a successful greenfield rewrite. Granted, it was a small scale app (two developers, 18 months). This was right at the time .NET and C# first came out, so we had an ulterior motive to play with new dev toys while doing the rewrite. .NET had stuff that we needed already built in, that we would have had to write ourselves (or purchase third party components for) if we had stayed with the existing VB6 app.
So when I hear industry experts such as yourself and Spolsky proclaim that this is “never” a good idea, and will “always” end in “failure”, my BS alarm goes off. Then I get skeptical about other proclaimed “truths”, and wonder if they are also misguided and overstated opinions also.
Bob, Microsoft’s (I think green field) development of its NT architecture and subsequent migration of its customer base to it was painful and took several years, but it seems to me it was necessary so that Microsoft didn’t fall behind in the OS market. Do you think Microsoft should have stayed with their Windows 95-based architecture? In my mind, the Win95 to WinNT transition was one of those (rare, maybe) green field projects that succeeded.
The question is not whether or not they should have changed, but how they should have gone about it. And mind you, M$ is an unusual case because it’s not an issue whether they can afford to keep multiple systems running. They quit adding features to the old as they built the new to be feature complete, and it costs them about the amount Bill loses in the couch at night.
“Bullshido.”
If evolution took this tactic, you’d still be an amoeba.