The Bloat at the Edge of Duplication Removal (The Orange Model)

The Bloat at the Edge of Duplication Removal (The Orange Model) 48

Posted by Michael Feathers Wed, 18 Feb 2009 03:51:00 GMT

There are times when I sit down to work on some piece of code and I see a bit of duplication. I remove it. Then I see a bit more, and a bit more. When I’m done with all of my extractions, I look back at the code and notice that it’s increased in size about 20 or 30 percent. If I’m working with someone else, it’s often a shock for them. After all, we are used to thinking that code gets smaller when we remove duplication. Often it does, but not always.

Let’s take a look at an example:


private void adjustPaths() {
  pathNew = pathNew.removeFirstSegments(1).removeFileExtension()
                  .addFileExtension("class");
  pathOld = pathOld.removeFirstSegments(1).removeFileExtension()
                  .addFileExtension("class");
}

That’s six lines of code with obvious duplication. Now let’s remove it.


private void adjustPaths() {
  pathNew = adjustPath(pathNew);
  pathOld = adjustPath(pathOld);
}

private IPath adjustPath(IPath path) {
  return path.removeFirstSegments(1).removeFileExtension()
               .addFileExtension("class");
}

Now we have 9 lines.

What’s going on here? Well, when we extract duplication, we have to put it someplace, and that operation isn’t free. In all languages, there is a cost in space for separate functions. In Java, the cost includes method declaration line and lines for the curly braces. In Python, the cost is (again) the declaration line and some whitespace to separate the function from other functions.

Code is like an Orange

The mental model I have for this is that code can be separated into two parts. One part is structural. It gives form to the code. It consists of all of the cruft that we use to declare a method. The stuff inside is the meat — the pure computational code which is held in place by the structural bits.


/* cruft */ private IPath adjustPath(IPath path) { 
/* meat  */    return path.removeFirstSegments(1).removeFileExtension()
/* meat  */                .addFileExtension("class");
/* cruft */ }

This pattern is rather common. In fact, you’ll see it in most living things. The muscle tissue in your body (the meat) is held in place by a sheath called fascia. The same pattern occurs in fruit and vegetables. The pulp in an orange is held in place by fascia too, it’s the white part between the sections:

P2170002

Here’s what duplication removal does, structurally. It allows you to pull out redundant bits of pulp from big sections, yielding smaller sections, but the side effect is that you end up with more fascia. Duplication removal increases the ratio of fascia to pulp. If the amount of pulp you are able to remove exceeds the size of the fascia you introduce, the net amount of code decreases, otherwise it might increase.

In general, I think that a high fascia to pulp ratio is better for maintenance. It gives us is a higher surface area to volume ratio for our code. This can enhance testability and make it easier to compose new software – we already have smaller more understandable pieces.

The cost of fascia, however, can be disturbing. In the normal formatting convention for C++ a new method will give you at least 5 lines. You need the line for the method declaration in the header, one for the definition in the source file, a line apiece for the beginning and ending braces and a newline to separate the method from next one. In Haskell, you can get away with 2 if the method is a one-liner and you forgo type declarations. The thing to notice is that it is never zero. There’s always some space overhead when we create a new method. The question is whether the amount of code that we remove as duplication swamps the addition overhead or not.

We can model duplication removal this way — the number of lines T’ after removing n stretches of x contiguous lines is:

T’ = T – (n – 1)x + c + n

We start with T, the original number of lines in the program and we remove n – 1 stretches of x lines. We leave in the nth x (as the method we extract) and we add in c which is the number of lines of fascia for a method in our language. We also add in n again since we will need to leave in a line for the call at each of the n sites where the code was originally duplicated.

According to the equation, we can end up with more code if c is large or if n and x are small. Aggressive refactoring with a c of 3 or 4 and and an n of 2 and an x of 1 will definitely increase code size.

Again, is this bad? No, I think it’s just something to be aware of. Removing duplication increases the ratio of fascia to pulp. To me, it’s only annoying when I’m working in a language with a high c value.

Posted in Michaels Musings
Meta 48 comments, permalink, rss, atom

Comments

Leave a response

Tracy Harms about 2 hours later:

I’m disinclined to much attention toward quantitative metrics such as lines of code. Insofar as we do put attention there, I think better comparisons would come from counting tokens rather than lines. Your formula applies independently of such units, though.
tieTYT about 3 hours later:

Hello, I’ve added this blog to my reader. Thanks for all the great articles.

I have an unrelated question about SRP. Some sort of forum would be the best place to ask but it doesn’t seem like objectmentor has one so I’ll ask it here. It’s about a topic I’d called “SRP vs Law of Demeter”. In summary, I find it difficult to avoid violating one of these when I obey the other.

I’m making a game and I have a class called BattleCharacter. This class used to have a List of StatusEffects that other classes (eg: Abilities) would add and remove. Since the BattleCharacter had methods like, findStatusEffectByClass and addStatusEffect and removeStatusEffect along with getHp and useAbilityOn(Ability, BattleCharacter target), I felt like this was too many responsibilities.

So what I tried to do is refactor List into a StatusEffects class and put all the StatusEffect related methods in there. Great, it looks better IMO but now I’m at odds with the Law of Demeter.

Abilities get a reference to a BattleCharacter to modify. If they decide to add/remove a status effect, they need to say, “battleCharacter.getStatusEffects().add(...)”; This violates the Law of Demeter as I understand it. So as a solution, I can add this method to BattleCharacter:

public void addStatusEffect(StatusEffect se) { this.statusEffects.add(se); }

But this, as I understand it, violates SRP again. How can I obey both?

PS: Maybe the latter doesn’t violate SRP. I always have assumed it does because most examples of SRP violations look at a class’ interface rather than implementation. Also, most examples exclude the client code that uses the new classes which I find to be unfortunate.
Tetsuo about 4 hours later:

But I don’t agree that the larger the surface the better. At the same time that each unit is smaller, if you break it too much, you end up with a sea of one-liners, as amorphous as the ‘big ball of mud’.

Few large black boxes, or one big box of sand?

I think the ‘path of the middle’ is always the best approach. Break the thing into pieces, small enough to be understandable, large enough to actually have a meaning.
Sebastian Kübeck about 10 hours later:

I don’t think increasing the code base by removing duplication is bad as in general, duplication causes more problems than more source code.

> To me, it’s only annoying when I’m working in a language > with a high c value.

High c? So you mean C#? :-)
Kim Gräsman about 12 hours later:

I wrote about a similar idea a while back—in Swedish, so I won’t bother you with a link.

My hypothesis is that code with a higher ratio of fascia to pulp is often of higher quality.

I’ve thought about cooking up a little tool that can analyse a code base and give you a fascia/pulp coefficient, and then run it on a number of code bases for which I have a subjective sense of quality, and see how that relates to the resulting numbers.

I’m not saying more fascia == better code unilaterally, but I think there may be a trend that code with more fascia is better.
PJ about 17 hours later:
I’d worry less about LOC and more about:
- does the refactoring make the code semantically clearer? In your example, ‘adjustPath’ to me is a little clearer, but the original subroune is called adjustPaths, so it’s probably a wash.
- does the refactoring ease maintenance by moving to a single common code definition? To me, this is the largest benefit: a single place where bugs can be fixed and changes can be made, without having to chase all over the code looking for where we did the .removeFirstSegments(1).... sequence.
Brett L. Schuchert about 17 hours later:

tieTYT wrote:

I have an unrelated question about SRP. Some sort of forum would be the best place to ask but it doesn’t seem like objectmentor has one so I’ll ask it here. It’s about a topic I’d called “SRP vs Law of Demeter”. In summary, I find it difficult to avoid violating one of these when I obey the other.

First comment, between the SRP and LoD, the SRP is a stronger principle. Martin Fowler and Booch have both lamented that it would have better been called the suggestion of Demeter (paraphrased).

Blind application of the LoD, as you’ve observed, leads to bloated API’s.

Wrapping the List into a class is one of the big things I always mention when teaching a TDD class. MOST of the time, a “raw” collection forces a lot of feature envy and duplication. Moving the list into a light-weight, wrapped collection gives a place to put appropriate responsibility.

Now to your particular question about whether you are or are not in violation the SRP. First off, SRP is about change, and not directly about the interface. The interface is related in that larger interfaces lead to objects that tend to have larger responsibilities, which makes the class more susceptible to change by virtue of its larger size.
Here’s the SRP defined (according to Bob):
A class should have one reason to change.

Can you violate the SRP? Yes, will it always cause problems? No. Question, does adding a pass-through method to update the collection cause problems? Probably not. It is likely to change? No. So if something is stable, then it’s not changing so if you are violating the SRP, it’s not going to cause problems.

Now, if the underlying object was in fact a list, I’d be much more worried. Why? Because as a list, it could be changed and the containing class (BattleCharacter) might not notice it. However, by making it an object that holds a list, its add method could both add to a collection and send an update (if that were necessary).

So by wrapping the list into a collection you’ve probably covered the most likely place where a potential violation of LoD could cause problems. Having done that, the next question is whether to add the methods to battle character to add/remove status effect or to get the status effect. Without seeing the whole class, I cannot really say. However, try one and see if it causes pain. If it does, then try the other thing. You’ll have unit tests to cover your ass, right?

This last point is an important one. Sometimes the principles might be at odds, often they are not. When they are, you need to try something and see if it causes problems. If it does, then try something else. If you want to flexibility to do this, you need a safety net. Unit tests can give that.

A very badly-written class that violates all of the principles, but that works and is not changing is not a problem. So, for example, if I saw an ugly class but it wasn’t being updated/changed in anyway, I’d leave it alone. As much as I’d prefer to have clean code throughout, in practice there’s more work that could be done than what can be done. So I choose to spend my time on code that is changing. If there’s some free time, it’s possible to go back and clean up stuff that’s not changing. However, most places don’t have any free time.

Finally, as Bob mentioned in one of his recent interviews, the principles are part of the overall rules he uses to develop software. They occur naturally to him. When something is causing problems, he’ll consider the principles to diagnose the problem and then come up with a way to refactor it. That you are concerned about the balance between SRP and LoD puts you in a great place. It is one thing to blindly violate something like the SRP. If you do it consciously – which you will/should eventually do – then you are in a much better place. Knowing the rules and when to break them is difference between a journeyman and a master (according to Meilir Page-Jones’ 7-stages).
Michael Feathers about 21 hours later:

Tetsuo wrote:

But I don’t agree that the larger the surface the better. At the same time that each unit is smaller, if you break it too much, you end up with a sea of one-liners, as amorphous as the ‘big ball of mud’.

It’s interesting to think about how much your hand is forced if you really want to get rid of duplication. I really don’t like the idea of a middle path on duplication, and when that notion goes away, you do seem to end up with code with much more surface area.
Michael Feathers about 21 hours later:

Kim wrote:

I wrote about a similar idea a while back—in Swedish, so I won’t bother you with a link. My hypothesis is that code with a higher ratio of fascia to pulp is often of higher quality.

I wish I could read Swedish. :-) Yes, there might be an optimal ratio.

One thing that I didn’t mention above is that I see control structures as fascia-like also. I don’t know where to go with that observation, but, well, there it is, nonetheless.
Michael Feathers about 21 hours later:

PJ: Fair enough. I was just making an observation about code structure.
Kim Gräsman 1 day later:

Michael,

> I wish I could read Swedish. :-)

I tried to Google-translate the page, but it turned into Gibberish :-)

Trust me, there was nothing more groundbreaking than what I summarized in the comment.
Chris Cobb 2 days later:

tieTYT wrote:

Maybe the latter doesn’t violate SRP. I always have assumed it does because most examples of SRP violations look at a class’ interface rather than implementation.

From Working Effectively with Legacy Code, “The SRP violation that we care most about is violation at the implementation level.” (p. 261) and goes on to point out that delegation indicates a facade, not a huge class that manages too many responsibilities. Whether that applies to your BattleCharacter class I’m not sure, but at least it should help in deciding when to break the letter of the law.
http://agileconsulting.blogspot.com 3 days later:

I think increasing the surface area is actually good, up to a point.

As long as you’re following a self describing code approach, and the surface area leads to increased understanding.

Lines of code as the Magic are completely useless in my book. It takes more effort to create a code base that has less LOC, so the measure is completely meaningless to me.

Great post, I’m going to use this with my padawans on my team “think of code as an orange…”
Dave Nicolette 6 days later:

I like the tie-in between oranges and a high C value.

Everyone can read Swedish. The hard part is understanding it.
Simon Richard Clarkstone 13 days later:
Languages with Forth-style syntax make this overhead rather lower. To roughly translate the original into Factor, assuming the class is “foo”):
```
: adjustPaths ( foo -- )
    [ [ 1 removeFirstSegments removeFileExtension "class" 
        addFileExtension ] change-pathNew ]
    [ [ 1 removeFirstSegments removeFileExtension "class" 
        addFileExtension ] change-pathOld ] bi ;
```
The overhead for a simple function is just a colon, the name, and a stack effect at the beginning, and a semicolon at the end:
```
: adjustPath ( path -- path )
    1 removeFirstSegments removeFileExtension "class" addFileExtension ;
: adjustPaths ( foo -- )
    [ [ adjustPath ] change-pathNew ] [ [ adjustPath ] change-pathOld ] bi ;
```
This encourages functions to be decomposed heavily, hence the style of omitting blank lines between closely-related functions. The lack of type declaration helps too.
ajuc 15 days later:

I wold prefer such refactoring:

private void adjustPaths() { Path[] paths = {pathNew,pathOld}; for (Path p : paths ) { p = p.removeFirstSegments(1).removeFileExtension() .addFileExtension(“class”); } }

I factors out structure from data.
L. 5 months later:

This image of meat and cruft is interesting. When we remove suplication, it is just normal that the share of cruft goes up with regards to meat. You’re taking two identical parts of meat and make it one, therefore removing meat.

(I will not be talking about removing structural duplication, in which case I think that the meat/cruft ratio does not change much)

The fact that cruft (i.e. structure) is growing makes me wonder why cruft is so important in languages. Maybe because we have no other way to create structure in a flat file.

Would there not be ways to create structure in, e.g., a database instead of a flat file? Would database-managed code be a good thing? The hard part would be for a human to read the code if it is structured in a DB.
latin shoes about 1 year later:

Most problems happens before testing, nice post anyway!
latin shoes about 1 year later:

would you mind updating your blog with more information?
jewellery about 1 year later:

would you mind updating your blog with more information?
hockey jersey about 1 year later:

would you mind updating your blog with more information?
Steelers jerseys about 1 year later:

would you mind updating your blog with more information?
nfl jersey supplier about 1 year later:

would you mind updating your blog with more information?
Dallas Cowboys shop about 1 year later:

Thanks for posting this. Marion Barber Jerseys Very nice recap of some of the key points in my talk. I hope you and your readers find it useful! Thanks again
JetsShop about 1 year later:

Thanks for posting this. Jets Jerseys Very nice recap of some of the key points in my talk. I hope you and your readers find it useful! Thanks again http://www.nfljetsshop.com/jets-jerseys Jets Jerseys
Get <a href=”http://www.favre-jersey.com”>Brett Favre Jerseys</a>, <a href=”http://www.favre-jersey.com”>Favre Jerseys</a>, Brett Favre Jerseys 60% Discount With Authentic Quality. Our supply selection of <a href=”http://www.favre-jersey.com/brett-favre-m about 1 year later:

Get http://www.favre-jersey.com”>Brett Favre Jerseys, http://www.favre-jersey.com”>Favre Jerseys, Brett Favre Jerseys 60% Discount With Authentic Quality. Our supply selection of http://www.favre-jersey.com/brett-favre-minnesota-vikings-4-black-shadow-jersey-p-9.html”>brett favre vikings jersey and cheap prices on jerseys shop. Brett Favre cheap jersey like http://www.favre-jersey.com/brett-favre-4-authentic-minnesota-vikings-purple-jersey-p-23.html”>Authentic brett favre jersey with replica prices, the player’s number is displayed in tackle sewn on the chest, back and sleeves, along with the team name on the front. if you like http://www.favre-jersey.com/brett-favre-4-minnesota-vikings-purple-jersey-p-8.html”>Brett Favre Purple Jersey, please buy it.
Pandora about 1 year later:

It is my favorite patter name (not favorite pattern, just name).
http://www.blacktowhiteiphone4.com about 1 year later:

iphone 4 white and iphone 4 black, which one will be more suitable for yourself? Thinking of the black iphone 4 is too ordinary, iphone 4 white will be the one to make you feel different.
accounting services about 1 year later:

I wrote about a similar idea a while back—in Swedish, so I won’t bother you with a link. My hypothesis is that code with a higher ratio of fascia to pulp is often of higher quality.accounting services
okey oyunu oyna over 2 years later:

great code.

internette görüntülü olarak okey oyunu oyna, gerçek kisilerle tanis, turnuva heyecanini yasa.
Tiffany Sale over 2 years later:

Thanks for the interesting story, although it did take rather a long time to complete reading. (English isn’t my country’s language) Could I ask in places you received your sources from? Thanks!
beats by dre store over 3 years later:

omplete reading. (English isn’t my country’s language) Could I ask in places you received your sources from? Thanks!high quality headphones new design headphones
african Mango dr oz over 3 years later:

It is not hard to understand why the question of value is still raised when you consider that today?s outsourcing models have some inherent limitations that reduce the overall gains companies can achieve .
best sleep aid over 3 years later:

When I at first left a comment I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get three notification emails with the same comment. Is there any way you can take away people from that service? Thanks a lot!
christian louboutin over 3 years later:

When I at first left a comment I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get three notification emails with the same comment. Is there any way you can take away people from that service? Thanks a lot!
grad school personal statement over 3 years later:

Very impressive post. Thank you for sharing.
north face khumbu over 3 years later:

This article is impressive,I hope that you will continue doing nice article like this.
iphone contacts backup over 3 years later:

A better way to export the contacts from Iphone to comptuer. try it.
sallyerma over 3 years later:

Thanks for your marvelous posting! I actually enjoyed reading it, you will be a great author.I will ensure that I bookmark your blog and will come back in the foreseeable future. I want to encourage that you continue your great job, have a nice weekend! layered hairstyles
Office 2010 over 3 years later:

This article is GREAT it can be EXCELLENT JOB and what a great tool!
bob cut hairstyles over 3 years later:

ncie website lay out simple and normal look and i like you website layout and many intresting comments on there about on hairstyles
hairstyles over 3 years later:

This image of meat and cruft is interesting. When we remove suplication, it is just normal that the share of cruft goes up with regards to meat.
Cute And easy hairstyles over 3 years later:

nice website lay out simple and normal look and i like you website layout and many intrest comments on there about on Cute And easy hairstyles
funny images over 3 years later:

thank you so much share with me this type or orange juice i love it
celebrity gossip over 3 years later:

please update this blog with more information. I have found it extremely useful.
long straight hairstyles over 3 years later:

This is a nice web site. Good fresh interface and nice informative articles. I will be coming back soon, thanks for the great article.
Wedding Hairstyles 2011 over 3 years later:

This is an intelligent blog. I’m serious. They have so much knowledge on this subject, so much passion. They also know how to do habitancy rally behind him, apparently from the answers. You’ve got here is not to find the obvious.
bladeless fans over 3 years later:

The Bloat at the Edge of Duplication Removal (The Orange Model) 47 good post58

Mentor	twitter id
Uncle Bob	unclebobmartin
Brett Schuchert	schuchert
Michael Feathers	mfeathers
Bob Koss	bob_koss