Size Matters 5
Contrary to what you may have heard or what you might like to hear, size really does matter. We programmers must take matters into our own hands and become masters of our domains. Unless we take action, things are just going to get bigger and bigger until we have a real mess on our hands.
I travel a lot and I get to visit a lot of different companies. No matter which industry a company is in or which programming language a team is using, there is one commonality in all of the code that I see – classes are just too damn big and methods are just too damn long. (What did you think I was talking about?)
Way back in the olden days when I had hair on my head, I studied Structured Design. This was where I learned the concept of cohesion. A software module with high cohesiveness was considered a good thing. As I transitioned to Object Oriented Design (still with a full head of hair), I learned Bertrand Meyer’s One Responsibility Rule and later Robert Martin’s Single Responsibility Principle. These latter two concepts restate and reenforce what Larry Constantine said back when Structured Design was in vogue – a module should do one thing and do it very well.
The trouble with this idea of a module (class or method) doing one thing is that it is subjective. What I consider one thing you might consider several things. For example, I might see a method as getting a Customer object out of a database, yet you see it as:
- establishing a connection to the database,
- forming the sql,
- executing the sql, and
- creating and returning a Customer object from what it found in the results of the sql execution.
A guideline that sometimes works when deciding if a module is doing “too much” is simply to describe what it does. If you find yourself using the word “and” in this description (or working really hard to phrase the description in such a way to avoid using “and”), it might be doing too much. Of course you have to adhere to the spirit of the guideline. I can describe the National Air Traffic Control System as “Prevents Collisions”. I didn’t have to use the word “and” once. It doesn’t follow that we can write the entire system as one class with one method – let’s call it main. Or if you program in C#, call it Main.
Uncle Bob presents a different view of “responsibility” in his Agile Software Development book. He promotes a responsibility as a reason to change. If a class has two reasons to change, it has two responsibilities and it might be wise to split the class into it’s two pieces.
This notion of breaking a class into smaller and smaller pieces is exactly opposite to what I learned when I first started studying OO. Way back when I worried about bad-hair days, people believed that a class should encapsulate everything that concerned it. A Customer class would know the business rules of being a Customer as well as how to retrieve itself from the database and display it’s data. That’s a fine idea, provided the database schema never changes, the display never changes, or the business rules never change. If any one of those responsibilities change, we are at a high risk of breaking other things that are coupled to it.
So what’s the real problem? What harm does it do if we go around proudly making our big modules even bigger. Well, I can think of a few problems:
1. Comprehensibility
The bigger a method (or class) is, the harder it is to understand without significant study and effort. I believe that as soon as I have to scroll a method in order to read it, I’m wasting valuable time because the method is doing more than my brain can hold. I often find myself scrolling back because I forgot what scrolled off the top of my window. I know I’m in a minority (considering all the code I’ve seen with massive classes and methods) but I like methods short and sweet.
2. Magnets for change
Massive classes with big methods do a lot – they have to because there is a lot of code in them. Unfortunately, that also means there’s a lot that can go wrong in them. When we fix a bug in one of these huge modules, we have to change the code – and changing code often means the code becomes worse. When we have to add additional functionality, the hooks seem to be in these big classes, so they get even bigger, and once again the code deteriorates even more. It’s easier to add code to existing classes and methods than it is to create new classes. Some companies have a heavy hand on the source code repositories and developers would rather make existing classes bigger than deal with the bureaucracy of adding another module to the corporate SCR.
3. Collisions
Because a lot of code is in each of the ever-growing modules, it stands to reason that different team members will be editing the code in these modules for different reasons. You know what that means come check-in time; the dreaded diff and merge. And because it’s so painful, developers put it off as long as possible which only makes the problem worse.
4. Lack of reuse
The bigger a module is, the less likely it is that you will be able to reuse it in a different context. It does so much that it becomes specialized to the current context.
5. Comments are needed
Large methods can’t be named for their intent, i.e., you can’t tell what a method does from its name because it does so much. Earlier this year I saw a multi-thousand line method named ‘execute’. Yeah, it was obvious what it did – not. Developers tend to write comments to explain what a method does. We’ve all seen them – the next 200 lines calculate a thingamabob, then another comment explaining the next 450 lines. The problem with comments is that in large systems, worked on by numerous developers over a period of years, the code goes one way and the comments tend to stay as originally written. When the code says one thing and the comments say another, which will you believe? Yet more obscurity.
6. Decreased code quality
Where do you think you’ll spot a glaring defect quicker, in a 6 line method or in the middle of a 300 line while loop with page-long if-else constructs in it?
7. Increased maintenance costs
Every time you visit a large module, you have to understand the piece that you’re going to work on. Often, that means you have to understand the entire module just to find the area where you are going to work. All of this takes time, and time is money. Ha – I seem to be on a roll with these sayings – “time is money”, “size matters”, hmmmm…. can “your check is in the mail” be far behind?
8. Easier performance profiling
When you are trying to locate performance bottle-necks, and whatever tool or timing mechanism you’re using tells you that the 8,000 line doItToIt() method is taking “too long”, how are you going to find where all the time is spent? It would be much easier to see the 6 line calculateAmount() function was taking too long because it was hitting the database.
There are many things in life that I wish were bigger—lots bigger (hard disk, thumb drive, monitor, RAM, etc.) but classes and methods should not be in that list. Understanding at a glance with good, meaningful, intention revealing names can go a long way to keeping software costs down and making our lives as developers better. I know, it isn’t easy to spare the time. It’s much easier to use that time to struggle to squeeze another clause to that already overgrown method; it’s much easier to use that time single stepping through the vastly indented forest of if/else/for/while; it’s much easier to use that time poring through a tangled and twisted rat’s nest of code over and over just to work out what it might just be doing. Oh yes, it’s MUCH easier.

A comment on the subjectiveness of single resposibility.
If I remember correctly, Uncle Bob wrote in his Agile Software Development book that resposibilities should be put in thier own classes on a need-to basis (at least for the non-trivial cases). That is, you shouldn’t try to define the responsibilities of establishing a connection or forming your SQL until you find (the hard way) that your design suffers from the fact that the class has more than one reason to change.
Defining a responsibility is, as you say, both difficult and subjective, so why not let the tests and refactorings tell you what needs to be treated as a responsibility!
This post really hits home. A manager from another group where I work asked for some help tracking down their performance problems. I start poking through the code and find many copy/paste 7K line classes w/ 3K+ line methods. Just makes you want to run for the hills.
I am with you and Uncle Bob on this one.
Wow – doItToIt …back when I used to work at Brokat, with their horrible Twister sofrware there were 8000 line doItToIt methods, which then took a String called something like “action”....
So about 8000 lines of .equalsIgnoreCase(“blah”)
then have a few layers of this… a real WTF.
I think it’s amusing that an essay titled “Size Matters” is being posted on a Web site in five-point text. Size matters!
Totally agree with your sentiments here. I have been reading through this blog and you make some fantastic comments that many many many more programmers should listen to!
Unfortunately we are the ones picking up the mess!