category michaels-musings

X Tests are not X Tests 361

Posted by Michael Feathers Mon, 13 Apr 2009 16:48:00 GMT

Testing is a slippery subject, and it’s reasonably hard to talk about for one simple reason: the nomenclature is chaotic. Years ago, I went to a summit with some testing gurus. I was one of the lone developers there and I asked about the taxonomy of testing. Cem Kaner, Bret Pettichord, Brian Marick, and James Bach went through it for us on a flipchart, and it was a nightmare. You can name tests after their scope: (unit, component, system), their place in the development process (smoke, integration, acceptance, regression), their focus (performance, functional) their visibility (white box, black box), the role of the people writing them (developer, customer).. The list goes on. There are far more than I can remember.

Why is it so confusing? There are are a couple of reasons. One is that different communities have developed different nomenclature over time. But, let’s face it, that’s true in most fields. The thing which makes testing nomenclature worse is that the tests themselves aren’t all that different, or at least, they are often not different enough for us to for us to distinguish them without being told. Yes, we can tell the difference between a unit test and an acceptance test in most systems, but really there is no force which prevents tests of different types from bleeding through into each other. Often the “type” of a test is more like an attribute: “here I have a blackbox smoke test, written by a developer for component integration.” In the end, all we have are tests and each of them can serve purposes beyond the purpose we originally intended.

Earlier today, I read a blog by Stephen Walther: TDD Tests are not Unit Tests. In it, he draws some distinctions between various types of testing. It’s great that he wrote it because it’s nice for us to have mental categories for these things, but we have to remember is that they really are just categories. We get to choose how distinct they will be. When I write code, most of my TDD tests end up being the same as my unit tests. I find value in forcing that overlap, and in general, I think overlapping test purposes are great to the degree that the purposes don’t conflict. You get more for less that way.

I don’t see any remedy for the muddle of test types. We will continue to make up terms to distinguish tests. We’ll just have to remember that the types are labels, not bins.

Posted in Michaels Musings
Meta 361 comments, permalink, rss, atom

A Brief Collection of Convenient Lies about Functional Programming 89

Posted by Michael Feathers Mon, 23 Mar 2009 15:19:00 GMT

A value is the instantaneous state of an object. – In OO languages, we have objects. In FP languages, we throw out the object and instead manipulate the values it would take on over time.
Algebraic data types are classes. – Every case in an ADT is a state that an “object” can be in.
```
data Tree = Empty 
          | Leaf Int 
          | Node Tree Tree
```
When we write functions over ADTs, we are obliged to cover all of the cases. So, for instance, if we define depth for Empty, we have to define depth for the Leaf and Node cases as well. When we do, we can evaluate depth t for any tree value and have a well-defined result.
The functions which we define over an ADT can be considered its public interface. – There’s a school of thought which says that encapsulation doesn’t matter in an functional programming language because values are immutable and corruption can’t happen. Nothing could be further from the truth. If we add or remove a case from an ADT all of the functions which pattern match against it are impacted. While we don’t need to have an encapsulation boundary as tight as we might have in an OO language – it pays to be conscious of how far ADTs travel in a program. Encapsulation is the act of forming a boundary by transforming an ADT into some other form of data.

Each of these statements is a lie, an artful simplification, but they are a convenient and not entirely false way of thinking about functional programming until it becomes second-nature.

Posted in Michaels Musings
Meta 89 comments, permalink, rss, atom

The Successor Value Pattern 67

Posted by Michael Feathers Sun, 22 Mar 2009 16:03:00 GMT

Functional programming is in the air. It’s nearly unavoidable. Even if you haven’t heard people talk about it or haven’t read a blog about it, you’ve probably seen its influence in your project. People are taking ideas that they learned in functional programming and applying them in straight object-oriented code. In Java, where there are no closures, today you are much more likely to encounter someone’s hand-rolled fold or map abstraction, and even if they haven’t gone that far, you are likely to find attempts to replace mutable data with immutable data. It’s part of the learning process, and there are some interesting patterns which occur along the way.

OO programmers often think in terms of entities. They imagine objects with identity and changeable local state; messages sent to objects alter their local state and trigger further message sends. Classic OO is intrinsically time-oriented. How do we pull this into a functional world? One thing that we can do is choose to see an entity as a series of values over time. Here’s one way to do it. If we have an entity like this:


public class EventDay {
    private LocalDate date;

    ...

    public void advance() {
        do {
            date = RegionalEventCalendar.nextValidDate(date)
        } while (!fitsLocalCalendar(date));
    }

    ...
}

We can change it to this:


public class EventDay {
    private final LocalDate date;

    ...

    public EventDay next() {
        return new EventDay(prospectiveDay(date));
    }

    ...
}

This is the successor value pattern. The notion is that you model state change as a series of value transformations. Is this good? Well, successor value is extremely common in functional programming languages. In Haskell, for instance, you don’t have mutation so you are always constructing new values. When Scala and F# are used in a functional style, you do the same thing; but is this a good idea in Java, C#, and C++? One concern is that the runtimes and libraries of those languages might not be as well tuned for continual reconstruction of value representations of the larger domain objects that we often see in OO designs. On the surface, however, successor value is nice; it gives us a mapping to immutable values and a dose of referential transparency.

Successor Value has an interesting quality from a type perspective. You compute successor values using a function which takes a type to another value of that same type. In the example above, next is a function like that. It maps from ‘this’, which has the type EventDay, to a new EventDay. In category theory, this is called an endomorphism. It’s a relationship with some cool characteristics. One of them is closure. You can chain any number of endomorphic operations and still end up with the type you started with:


LocalDate next  = date.nextDay().nextWeek().fridayAfter();

This is really about as encapsulated as you can get. Endomorphic chains don’t betray their representations and they don’t force users to use new types. In a way, you can look at an endomorphic chain as a state machine over an entity, spread out over time.

If you are used to thinking in terms of entities, you can mechanically translate entities into values and mapping functions. However, the better thing is to rethink your data types a bit. Sometimes the mapping from entities to values is clean but often with a bit of thought you can end up with representations which are better tuned for functional work.

Posted in Michaels Musings
Meta 67 comments, permalink, rss, atom

10 Papers Every Programmer Should Read (At Least Twice) 905

Posted by Michael Feathers Fri, 27 Feb 2009 02:49:00 GMT

I spent most of yesterday afternoon working on a paper I’m co-writing. It was one of those days when the writing came easy. I was moving from topic to topic, but then I realized that I was reaching too far backward – I was explaining things which I shouldn’t have had to explain to the audience I was trying to reach.

When I first started writing, one of the pieces of advice that I heard was that you should always imagine that you are writing to a particular person. It gets your juices going – you’re automatically in an explanatory state of mind and you know what you can expect from your audience. I was doing that, but I noticed that I was drifting. I was losing my sense of audience. I started to explain one thing, and then I realized that I would have to explain something else to help it make sense. I couldn’t imagine that person any more. How could I know what they know and what they don’t?

The problem I was experiencing is only getting worse. People come into programming from many different directions. Some started in other fields, and others started programming as teens. Some started with BASIC, others started with Ruby or C. The industry is filled with knowledge, but it isn’t common knowledge. It isn’t knowledge that we all share. We have to dig for it because of a peculiar fact about our industry: we reinvent our languages and notations every ten years. It’s hard to find deeply technical books and articles which stand the test of time in software: they are all Latin within 20 years.

So, I was thinking about this and trying to not to get too glum. I realized that instead of complaining, I could help by pointing to some papers which are easily available online and which (to me at least) point to some of the most interesting ideas about software. To me, these are classic papers which contain deep “things you oughta know” about code – the material you work with.

We’ve taken an interesting turn in the industry over the past ten years. We’ve come to value experiential learning much more, and we’ve regained a strong pragmatic focus, but I think it would be a shame if we lost sight of some of the deeper things which people have learned over the past 50 years. Rediscovering them would be painful, and (to me) not knowing them would be a shame.

Here’s the original list. It’s a rather personal list of foundational papers and papers with deep ideas. I wrote it “off the cuff” and threw it into a tumblr blog the other day and I got responses from people who suggested others. I’ll add those in a later blog.

Most are easy to read but some are rough going – they drop off into math after the first few pages. Take the math to tolerance and then move on. The ideas are the important thing.

On the criteria to be used in decomposing systems into modules – David Parnas
A Note On Distributed Computing – Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kendall
The Next 700 Programming Languages – P. J. Landin
Can Programming Be Liberated from the von Neumann Style? – John Backus
Reflections on Trusting Trust – Ken Thompson
Lisp: Good News, Bad News, How to Win Big – Richard Gabriel
An experimental evaluation of the assumption of independence in multiversion programming – John Knight and Nancy Leveson
Arguments and Results – James Noble
A Laboratory For Teaching Object-Oriented Thinking – Kent Beck, Ward Cunningham
Programming as an Experience: the inspiration for Self – David Ungar, Randall B. Smith

(edit: Added a brief synopsis of each of them and why I think they are special):

On the criteria to be used in decomposing systems into modules – Parnas

This is a very old paper, but it is more than a classic. In in it, Parnas introduces a forerunner to the Single Responsibility Principle. He introduces the idea that we should use modularity to hide design decisions – things which could change. People still don’t consider this as often as they should.

Another thing I really like in the paper is his comment on the KWIC system which he used as an example. He mentioned that it would take a good programmer a week or two to code. Today, it would take practically no time at all. Thumbs up for improved skills and better tools. We have made progress.

A Note On Distributed Computing – Waldo, Wyant, Wollrath, Kendall

Abstraction is great but it can only go so far. In this paper, the authors lay to rest what was once a pervasive myth – that we could design a distributed system and make distribution transparent. Ever wonder why you had to implement specific interfaces to do remoting in Java? This is why.

In the aftermath it might seem hard to believe that people thought this was possible. I think we can we partially thank this paper for that.

The Next 700 Programming Languages – Landin

Most of us have spent a lot of time working in traditional programming languages, but functional programming languages are slowly seeing an uptick and many OO languages are gaining functional features. This paper (which reads like a tutorial) makes an argument for an expression-oriented style of programming. It also lays the foundation for lazy evaluation.

One of the other neat things about this paper, from a historical point of view, is that there is a discussion section at the end in which there a number of questions and comments about whether making indentation significant in a language is a good idea. I was thrown to see people asking whether or not this would be a problem for functions which span over several pages(!).

Can Programming Be Liberated from the von Neumann Style? – Backus

John Backus is known for a number of achievements in computer science. He received the ACM Turing Award for his work on Fortran. This paper, which he presented at the award ceremony was rather shocking at the time because it said, in essence, “we got it wrong.” Backus took the opportunity to make a plea for pure functional programming. His arguments were convincing and they helped to set a research agenda which is just now starting to make some waves in the mainstream.

Reflections on Trusting Trust – Thompson

I once heard that when this paper was presented, people in attendance rushed back to de-compile their C compilers and look for, er, problems. This paper unveiled a hard problem at the heart of computer security. If you’ve spent any time at all thinking about security, you need to read it.

Lisp: Good News, Bad News, How to Win Big – Gabriel

This paper is a bit atypical in this list. It’s aimed at the Lisp community and it comes off as a bit of a lament. But, hidden deep within it is the Gabriel’s description of the ‘Worse is Better’ philosophy – an idea with profound implications for the acceptance and spread of technology.

An experimental evaluation of the assumption of independence in multiversion programming – John Knight and Nancy Leveson

Behind this dry title lies something very interesting. I first heard about this paper from Ralph Johnson in a newsgroup discussion about program correctness. It turns out that one of the avenues that engineers in other disciplines take to make their products stronger – redundancy – doesn’t really work in software. Multi-version programming was the idea that you could decrease faults in critical systems by handing the spec to several teams, having them develop the software independently, and then having the systems run in parallel. A monitoring process verifies their results and if there is any discrepancy it picks the most common result. Sounds like it should work, right? Well..

Arguments and Results – Noble

I think that all of the other papers in this list are rather well known in some circles. This one isn’t, or if it is, I just haven’t found that circle yet. What I like about this paper is that it takes something which we deal with every day – the set of arguments and results of functions – and it works them through a series of variations which just don’t occur to many people. The fact is, every function that you work with has a number of possible directions if could evolve in. Not all of them are appropriate, but if you know the possible directions, you’re richer for it.

A Laboratory For Teaching Object-Oriented Thinking – Beck, Cunningham

There are an incredible number of papers about there about object orientation. The thing which makes this one great is its directness. OO went through a number of stages. It was once fresh and novel, then it was ornate, and then it became matter-of-fact. This paper hits upon key ideas which many people don’t talk about much any more: anthropomorphism and dropping the top/down perspective. It also shows you how you can design with index cards. It may not sound cool but it is incredibly effective.

Programming as an Experience: the inspiration for Self – Ungar, Smith

How many people know about the Self Project? Not enough in my opinion. Self was an attempt to take two ideas in computing and push them as far as humanly possible. The first was minimalism: the Self programming language was thoroughly in the Lisp and Smalltalk vein – everything was defined in terms of the smallest number of primitives possible. The other idea was direct manipulation – the idea that the object metaphor could be pushed all the way in the user interface – the programmer and user sit with a mouse in a sea of directly clickable objects and use them for everything. If they could’ve gotten away with dropping the keyboard, I think they would’ve.

The amount of technology which the Self project threw off is terrifying also. Self broke ground in dynamic language optimization and VM design. Chances are, your VM uses technology it pioneered. It’s also one of the more perverse ironies that the most widely distributed programming language today (JavaScript) is a prototype-based programming language which borrowed ideas from the hyper-research-y Self.

What would you add to the list?

Posted in Michaels Musings
Meta 905 comments, permalink, rss, atom

The Bloat at the Edge of Duplication Removal (The Orange Model) 49

Posted by Michael Feathers Wed, 18 Feb 2009 03:51:00 GMT

There are times when I sit down to work on some piece of code and I see a bit of duplication. I remove it. Then I see a bit more, and a bit more. When I’m done with all of my extractions, I look back at the code and notice that it’s increased in size about 20 or 30 percent. If I’m working with someone else, it’s often a shock for them. After all, we are used to thinking that code gets smaller when we remove duplication. Often it does, but not always.

Let’s take a look at an example:


private void adjustPaths() {
  pathNew = pathNew.removeFirstSegments(1).removeFileExtension()
                  .addFileExtension("class");
  pathOld = pathOld.removeFirstSegments(1).removeFileExtension()
                  .addFileExtension("class");
}

That’s six lines of code with obvious duplication. Now let’s remove it.


private void adjustPaths() {
  pathNew = adjustPath(pathNew);
  pathOld = adjustPath(pathOld);
}

private IPath adjustPath(IPath path) {
  return path.removeFirstSegments(1).removeFileExtension()
               .addFileExtension("class");
}

Now we have 9 lines.

What’s going on here? Well, when we extract duplication, we have to put it someplace, and that operation isn’t free. In all languages, there is a cost in space for separate functions. In Java, the cost includes method declaration line and lines for the curly braces. In Python, the cost is (again) the declaration line and some whitespace to separate the function from other functions.

Code is like an Orange

The mental model I have for this is that code can be separated into two parts. One part is structural. It gives form to the code. It consists of all of the cruft that we use to declare a method. The stuff inside is the meat — the pure computational code which is held in place by the structural bits.


/* cruft */ private IPath adjustPath(IPath path) { 
/* meat  */    return path.removeFirstSegments(1).removeFileExtension()
/* meat  */                .addFileExtension("class");
/* cruft */ }

This pattern is rather common. In fact, you’ll see it in most living things. The muscle tissue in your body (the meat) is held in place by a sheath called fascia. The same pattern occurs in fruit and vegetables. The pulp in an orange is held in place by fascia too, it’s the white part between the sections:

P2170002

Here’s what duplication removal does, structurally. It allows you to pull out redundant bits of pulp from big sections, yielding smaller sections, but the side effect is that you end up with more fascia. Duplication removal increases the ratio of fascia to pulp. If the amount of pulp you are able to remove exceeds the size of the fascia you introduce, the net amount of code decreases, otherwise it might increase.

In general, I think that a high fascia to pulp ratio is better for maintenance. It gives us is a higher surface area to volume ratio for our code. This can enhance testability and make it easier to compose new software – we already have smaller more understandable pieces.

The cost of fascia, however, can be disturbing. In the normal formatting convention for C++ a new method will give you at least 5 lines. You need the line for the method declaration in the header, one for the definition in the source file, a line apiece for the beginning and ending braces and a newline to separate the method from next one. In Haskell, you can get away with 2 if the method is a one-liner and you forgo type declarations. The thing to notice is that it is never zero. There’s always some space overhead when we create a new method. The question is whether the amount of code that we remove as duplication swamps the addition overhead or not.

We can model duplication removal this way — the number of lines T’ after removing n stretches of x contiguous lines is:

T’ = T – (n – 1)x + c + n

We start with T, the original number of lines in the program and we remove n – 1 stretches of x lines. We leave in the nth x (as the method we extract) and we add in c which is the number of lines of fascia for a method in our language. We also add in n again since we will need to leave in a line for the call at each of the n sites where the code was originally duplicated.

According to the equation, we can end up with more code if c is large or if n and x are small. Aggressive refactoring with a c of 3 or 4 and and an n of 2 and an x of 1 will definitely increase code size.

Again, is this bad? No, I think it’s just something to be aware of. Removing duplication increases the ratio of fascia to pulp. To me, it’s only annoying when I’m working in a language with a high c value.

Posted in Michaels Musings
Meta 49 comments, permalink, rss, atom

A Wish List for the Next Mainstream Programming Language 126

Posted by Michael Feathers Tue, 30 Dec 2008 02:06:00 GMT

It’s been fun watching the reactions to new features in C# 4.0. Some people love them. Others wonder, legitimately, where it is all going to end. The argument for feature addition is simple: over time we find better ways of doing things and our languages, as tools, should allow us that flexibility. The only downside to this strategy is that you end up with sprawling, complex languages over time – you never get to revisit the foundations.

Fortunately, however, people design new languages all the time and some of them do eventually enter the mainstream. We get a chance to start over and address foundational problems. And, that’s nice because we can do better than Java and C# for mainstream development and I don’t think there is any way to mutate either language into a better foundation.

Before I launch into the wish list, however, I want to set the context.

When I say “mainstream language” I am talking about languages which are in the Java/C#/VB market space – languages which are light on rocket science, seen as suitable for large-scale development, and don’t scare people. So, I’m not going to suggest dynamic typing or (on the other side of the coin) tight Hindley-Milner type systems and lazy evaluation. I love those approaches and I’m happy (in particular) that Ruby is gaining widespread acceptance, but I’m not going to fight that fight here. In the immediate future, for whatever reason, there will be development shops which feel much more comfortable with traditional static typing – the kind found in Java, C#, and VB. Given that, the question becomes: what can we do to make that sort of language better?

Here’s my list:

Immutability by Default – Over the past few years, a rather broad consensus has emerged around the idea that code is easier to understand and maintain when it has less mutable state. This isn’t a new idea; it’s been around for as long as functional programming, but our recent concerns with concurrency and our move toward multi-core computing just underscore the state problem. A mainstream language should, at the very least, make mutable data something special. References should be immutable by default and mutable state should be marked by a special keyword so that its use leaps out at you. It’s too late for the current crop of languages to make such a pervasive change, but the next mainstream language could.
Support for Components – In large-scale development, teams have to manage usage and dependency across an organization. The notions of public, protected, and private are too coarse as protection mechanisms, and we really need scopes larger than class, assembly, and package. In the end, this sort of protection is a social issue, so we should have mechanisms which make use very easy within a team (3-10 people working together) and somewhat more manageable between teams. It’s odd that language-based support for this work stopped with Ada and the Modula family of languages. Java’s recent move toward support for modules seems to be an exception.
Support for Scoped and Explicit Metaprogramming – In the past, language designers avoided adding meta-programming support to their languages because they were scared it would be abused. However, we’ve learned that without meta-programming support, people create yet another distinctive type of mess. If there is a middle ground for mainstream languages it probably involves scoping the use of metaprogramming and making it far more detectable. If, for instance, all of the code which modifies a given component had to be registered with some easily locatable component-specific construct, maintenance would be much easier.
Support for Testing – This one is only a matter of time, I think. In the last 10 years we’ve seen an explosion of mocking tools and testing frameworks. It’s not clear to me that we’ve reached any sort of consensus yet, but I suspect that at the very least we could add constructs to a language which make mocking and stubbing much easier. It’s also about time that languages attempt to solve the problems that dependency injection addresses.
Imposed I/O Separation – This is the controversial one. The more I work with Haskell, the more I notice that there is a beneficial discipline that comes from factoring your program so that its points of contact with the outside world can not be mixed with the pieces doing the work. When you first start to work that way, it feels like a straitjacket, but I think the benefit is apparent to anyone who has had to go on a spelunking expedition in an application to find and mock parts of the system which touch the outside world. Is that discipline too much for a mainstream language? I hope not.

So, that’s my list. There is no “grand language planning board” which decides these things. We will move forward chaotically like ever other industry, but I do hope that some of these features make it into the next mainstream programming language.

Posted in Michaels Musings
Meta 126 comments, permalink, rss, atom

Speaking in Russia 7

Posted by Michael Feathers Thu, 27 Nov 2008 19:50:00 GMT

The week after next, I’ll be visiting Russia to do some training and some University lectures.

The lectures will be at Ural State University, Ekaterinburg. Each starts at 18:30:

Monday, Dec 8th Recovering a Code Base

Wednesday, Dec 10th Error Handling as a First Class Consideration in Design

Thursday, Dec 11th Design Sense – Reaching Consensus on Excellence

I’ll also be speaking to the Agile Russia group in Moscow Friday, Dec 12th.

The internet tells me it might be colder than Miami. I will probably bring a jacket.

Posted in Michaels Musings
Meta 7 comments, permalink, rss, atom

Exploding Link Stubs in C 52

Posted by Michael Feathers Sat, 08 Nov 2008 01:33:00 GMT

When you’re working in a batch of legacy C code, it’s often hard to build and test just a part of it. You may want to build some set of functions, but then you discover that they call other functions, and those functions call yet other functions. When link dependencies are very bad, you may discover that you’re linking in the entire system.

Fortunately, there is a way out of this. Pick a set of files that appear to be a “component” and attempt to build them into an executable along with a simple main function.

You will get link errors.

Create a file called <componentname>_stubs.c and then go through each of the link errors. Look up the full declaration of the variable or function the error describes. Create a stub for each of them in the stub file.

Here’s an example of a function stub:


int irc_send_ppr(struct ppr *pprn, int nSize)
{
}

The only problem with this is that you have to place something in the stub function. For functions which return values you need to, at the very least, provide a return value. But what should it be?


int irc_send_ppr(struct ppr *pprn, int nSize)
{
  /* is negative -1 okay?   If it isn’t, how will we know? */
  return -1;
}

There is alternative ++:


int irc_send_ppr(struct ppr *pprn, int nSize)
{
  assert(!”irc_send_ppr boom!!”);
}

This is an exploding link stub. When you have it in place, you will know when a function you’re testing calls a stub. When you see the boom you go to the code and replace the assert with a better stub implementation.

When you use link stubs, you have to be able to build your component two different ways. Production builds link to the rest of the code. Test builds link to the stubs and a testing main. There’s no need to stub out all of the external functions that a component uses. Many low level functions can remain direct calls, but stubbing out calls to other high level components can give you a decent amount of leverage as you try to get an area under test.

Fortunately, for any given component you only have to go through the massive grunt work once. Once you do, you can reap the reward of easier testing in that component forever.

++ Note: If you are using C99, you can genericize this code by using the __func__ predefined macro. In any function, __func__ expands to the name of the enclosing function.

Posted in Michaels Musings
Meta 52 comments, permalink, rss, atom

Data Rattle 18

Posted by Michael Feathers Tue, 28 Oct 2008 06:01:00 GMT

Take a look at this code:

public static int findPositionInInterval(int [] content, int start, int end) {
  for(int n = 0; n < content.length; n++) {
    int current = transform(content[n]);
    if (start <= current && current <= end) {
      return n;
    }
  }
  return -1;
}

How does it look? Great? Poor? Is it code with a glaring problem?

Hold that thought for a second and then take a look at this code:


public static int findPositionInInterval(final int [] content, final int start, final int end) {
  for(int n = 0; n < content.length; n++) {
    final int current = transform(content[n]);
    if (start <= current && current <= end) {
      return n;
    }
  }
  return -1;
}

I like it better.

The fact of the matter is: mutable state hurts. Unless you’ve done some functional programming you might not realize that it hurts, but it does. It may feel okay to walk barefoot, but if some gives you a comfy pair of shoes you quickly learn that, well, your feet were irritated by pebbles and twigs and there was a better way of walking around, you just didn’t know about it. Fixing state and making it immutable is just like that. When you become used to that style, you notice that it’s easier to reason about your code.

In my mind’s eye, the first snippet is a like a box containing loose parts. If you shake it, it rattles. Parts that don’t need to move are fixed in place by final. In code as brief as this, it doesn’t smell mightily, but it smells nonetheless.

So mutable data is a smell and we can fix it. However, there’s a problem. I’d argue that the second snippet is just a little noisier than the first one. Java forces us to do something special to make data immutable. C# and C++ are the same way: mutable is the default and immutability requires special keywords. There are some languages which make immutability the default, or least make don’t make you pay the price of an extra token to make something immutable. Haskell, OCaml, Scala, and F# all fit into this category. In the older languages, however, we’ll continue to have a lot of rattling data.

Listen for it.

Posted in Michaels Musings
Meta 18 comments, permalink, rss, atom

The Fact/Intention Gap 40

Posted by Michael Feathers Tue, 21 Oct 2008 16:57:00 GMT

The other day, I was browsing some code with a team and we came across a cluster of static methods. I looked at them and asked whether they were used someplace else without an object reference. I could’ve done a find references in the IDE, but one of the team members had the answer: “No, they’re just used here.”

“Okay, so why are they static?”, I asked.

“Well, the IDE pointed out that they don’t refer to any instance data, so I made them static.”

Technically, there’s nothing wrong with that. I do the same thing sometimes. I make a method static to document its independence of instance data. But, this scenario highlights something important about static and a few other keywords: their uses can be seen as statements of fact or statements of intention. Static can be read as “Hey, this function doesn’t use any instance data, and you should know that.” Or, it can be read as “I am making this static so that it can be used easily anyplace without an instance.” When you’re trying to read a pile of unfamiliar code, it’s nice to know whether you can count on one meaning or the other.

I don’t think I’ve ever heard anyone articulate this directly, but there’s a tendency in a lot of best practice literature to close the gap between fact and intention. Joshua Bloch’s advice to use make fields final whenever you can in Java is a good example. The traditional advice to make methods private whenever they are not used outside of a class is another.

Part of me feels that closing the gap is a good practice but the fact is, there are holes in it. If you are developing a library or a framework, you do have to write code that may not reflect the facts of your code, but rather will reflect the facts of your code and the code that people write to use it or extend it. Beyond that, it might be overly conservative to reduce visibility of things that aren’t used beyond a particular scope. For instance, imagine a method that is used only by other methods of a class. We understand the invariant of the class and we see that the method could be public. We could make it private now, but that means that anyone who wants to make it public later would have to do some re-analysis to arrive at the answer that we have right now.

The Fact/Intention Gap is a very real thing. Whether we know it or not, we confront it every time we try to understand unfamiliar code. I think there’s only one way to solve it, and that’s to try to separate fact from intention in our languages and tooling. Imagine what it would be like if your IDE gave you a visual indicator for all methods which didn’t use instance data. If it did, you could use static on methods only when you want to indicate that the intention is to use them without an instance.

It seems that IDE developers are moving in this direction. I wonder if any language designers will follow suit.

Posted in Michaels Musings
Meta 40 comments, permalink, rss, atom

Mentor	twitter id
Uncle Bob	unclebobmartin
Brett Schuchert	schuchert
Michael Feathers	mfeathers
Bob Koss	bob_koss

X Tests are not X Tests 361

A Brief Collection of Convenient Lies about Functional Programming 89

The Successor Value Pattern 67

10 Papers Every Programmer Should Read (At Least Twice) 905

The Bloat at the Edge of Duplication Removal (The Orange Model) 49

Code is like an Orange

A Wish List for the Next Mainstream Programming Language 126

Speaking in Russia 7

Exploding Link Stubs in C 52

Data Rattle 18

The Fact/Intention Gap 40

Blog Search

Follow us on twitter

Categories

Blogroll

Syndicate