Exploding Link Stubs in C 4

Posted by Michael Feathers Fri, 07 Nov 2008 19:33:00 GMT

When you’re working in a batch of legacy C code, it’s often hard to build and test just a part of it. You may want to build some set of functions, but then you discover that they call other functions, and those functions call yet other functions. When link dependencies are very bad, you may discover that you’re linking in the entire system.

Fortunately, there is a way out of this. Pick a set of files that appear to be a “component” and attempt to build them into an executable along with a simple main function.

You will get link errors.

Create a file called <componentname>_stubs.c and then go through each of the link errors. Look up the full declaration of the variable or function the error describes. Create a stub for each of them in the stub file.

Here’s an example of a function stub:


int irc_send_ppr(struct ppr *pprn, int nSize)
{
}

The only problem with this is that you have to place something in the stub function. For functions which return values you need to, at the very least, provide a return value. But what should it be?


int irc_send_ppr(struct ppr *pprn, int nSize)
{
  /* is negative -1 okay?   If it isn’t, how will we know? */
  return -1;
}

There is alternative ++:


int irc_send_ppr(struct ppr *pprn, int nSize)
{
  assert(!”irc_send_ppr boom!!”);
}

This is an exploding link stub. When you have it in place, you will know when a function you’re testing calls a stub. When you see the boom you go to the code and replace the assert with a better stub implementation.

When you use link stubs, you have to be able to build your component two different ways. Production builds link to the rest of the code. Test builds link to the stubs and a testing main. There’s no need to stub out all of the external functions that a component uses. Many low level functions can remain direct calls, but stubbing out calls to other high level components can give you a decent amount of leverage as you try to get an area under test.

Fortunately, for any given component you only have to go through the massive grunt work once. Once you do, you can reap the reward of easier testing in that component forever.

++ Note: If you are using C99, you can genericize this code by using the __func__ predefined macro. In any function, __func__ expands to the name of the enclosing function.

Data Rattle 7

Posted by Michael Feathers Tue, 28 Oct 2008 01:01:00 GMT

Take a look at this code:

public static int findPositionInInterval(int [] content, int start, int end) {
  for(int n = 0; n < content.length; n++) {
    int current = transform(content[n]);
    if (start <= current && current <= end) {
      return n;
    }
  }
  return -1;
}

How does it look? Great? Poor? Is it code with a glaring problem?

Hold that thought for a second and then take a look at this code:

public static int findPositionInInterval(final int [] content, final int start, final int end) {
  for(int n = 0; n < content.length; n++) {
    final int current = transform(content[n]);
    if (start <= current && current <= end) {
      return n;
    }
  }
  return -1;
}

I like it better.

The fact of the matter is: mutable state hurts. Unless you’ve done some functional programming you might not realize that it hurts, but it does. It may feel okay to walk barefoot, but if some gives you a comfy pair of shoes you quickly learn that, well, your feet were irritated by pebbles and twigs and there was a better way of walking around, you just didn’t know about it. Fixing state and making it immutable is just like that. When you become used to that style, you notice that it’s easier to reason about your code.

In my mind’s eye, the first snippet is a like a box containing loose parts. If you shake it, it rattles. Parts that don’t need to move are fixed in place by final. In code as brief as this, it doesn’t smell mightily, but it smells nonetheless.

So mutable data is a smell and we can fix it. However, there’s a problem. I’d argue that the second snippet is just a little noisier than the first one. Java forces us to do something special to make data immutable. C# and C++ are the same way: mutable is the default and immutability requires special keywords. There are some languages which make immutability the default, or least make don’t make you pay the price of an extra token to make something immutable. Haskell, OCaml, Scala, and F# all fit into this category. In the older languages, however, we’ll continue to have a lot of rattling data.

Listen for it.

The Fact/Intention Gap 8

Posted by Michael Feathers Tue, 21 Oct 2008 11:57:00 GMT

The other day, I was browsing some code with a team and we came across a cluster of static methods. I looked at them and asked whether they were used someplace else without an object reference. I could’ve done a find references in the IDE, but one of the team members had the answer: “No, they’re just used here.”

“Okay, so why are they static?”, I asked.

“Well, the IDE pointed out that they don’t refer to any instance data, so I made them static.”

Technically, there’s nothing wrong with that. I do the same thing sometimes. I make a method static to document its independence of instance data. But, this scenario highlights something important about static and a few other keywords: their uses can be seen as statements of fact or statements of intention. Static can be read as “Hey, this function doesn’t use any instance data, and you should know that.” Or, it can be read as “I am making this static so that it can be used easily anyplace without an instance.” When you’re trying to read a pile of unfamiliar code, it’s nice to know whether you can count on one meaning or the other.

I don’t think I’ve ever heard anyone articulate this directly, but there’s a tendency in a lot of best practice literature to close the gap between fact and intention. Joshua Bloch’s advice to use make fields final whenever you can in Java is a good example. The traditional advice to make methods private whenever they are not used outside of a class is another.

Part of me feels that closing the gap is a good practice but the fact is, there are holes in it. If you are developing a library or a framework, you do have to write code that may not reflect the facts of your code, but rather will reflect the facts of your code and the code that people write to use it or extend it. Beyond that, it might be overly conservative to reduce visibility of things that aren’t used beyond a particular scope. For instance, imagine a method that is used only by other methods of a class. We understand the invariant of the class and we see that the method could be public. We could make it private now, but that means that anyone who wants to make it public later would have to do some re-analysis to arrive at the answer that we have right now.

The Fact/Intention Gap is a very real thing. Whether we know it or not, we confront it every time we try to understand unfamiliar code. I think there’s only one way to solve it, and that’s to try to separate fact from intention in our languages and tooling. Imagine what it would be like if your IDE gave you a visual indicator for all methods which didn’t use instance data. If it did, you could use static on methods only when you want to indicate that the intention is to use them without an instance.

It seems that IDE developers are moving in this direction. I wonder if any language designers will follow suit.

If-Methods Redux 10

Posted by Michael Feathers Fri, 26 Sep 2008 12:05:00 GMT

So was looking over my blog on If-Methods this morning and I realized that I fell into a trap. I was solving one problem but thinking about another.

There I was working on lexer code and I thought “Hmm… I’d really like to extract these if-blocks.” I mulled it over a bit, and while thinking about the general problem of naming if-blocks, I settled on a naming scheme that I thought would be good in general, but it ended up being a bit odd in particular. The comments on the blog show this. Many of them offer good suggestions for dealing with the particular case I showed, but I was really after something different, a good general-purpose naming scheme for methods extracted around conditional.

I’m going to try this again and see if I can come up with a case which motivates a generic naming scheme a bit better.

Consider this chunk of code.


    ...
    if (sensorTripped) {
        ...
        alarm.sound();
    }
    ...

We could extract the body of the if-block, but in a long method, we might want to extract the entire concern. What should we call it?

One option is to call it soundAlarm(). It’s not quite true, however. The sounding is conditional.

We could, on the other hand, get explicit: soundAlarmIfSensorTripped(). This smacks more of implementation than intention, however. Now, having said that, I do think that in legacy code these names are great reminders as you make your first cut.. slicing and dicing a long method. They help you remember what is going on as you plan your next move.

Here’s another option: handleAlarmActions(). I don’t like this one very much. The word handle is nice, but the name makes it sound like the alarms are an event.

How about this one? handleSensorStateChange(). Much better in my opinion. I’m not quite sure how to justify this but I think that it is better to name these sorts of methods after their condition rather than their action. In the legacy code situation, at least, it helps to accent conditions. In not-so legacy code, it separates concerns at a deeper level.

Are there exceptions? Of course. Null pointer checks, guards.. the conditions that indicate errors rather than problem domain conditionality.. code surrounded by these can be safely extracted into methods named for their primary action.

This:


    ...
    if (alarm != null) {
        ...
        alarm.sound();
    }
    ...

becomes this:


    ...
    soundAlarm();
    ...

with this definition:


    void soundAlarm() {
        if (alarm == null) {
            // handle error, throw exception, etc.
        }
        ...
        alarm.sound();
    }

That code still has problems, but that extraction was a way forward.

All this is good, as far as it goes, but there is still a problem. The handle naming idiom seems to always call for generalization when I use it in real cases.

Imagine a condition like this:


if (sensor != null 
        && sensor.isTripped() 
        && configFlag == X_MODE 
        && usingBasicIO) {
    ...
}

It would be handle.. what?

Sometimes you have a good name for a condition, sometimes you don’t. A convention would be nice, however.

If-Methods 21

Posted by Michael Feathers Tue, 23 Sep 2008 18:26:00 GMT

I was working on some lexical analysis code this morning and I was stumped by a name. Or, rather, the lack of a good name. The code was a long series of if-statements, each of which tested for a lexical condition and then handled it:

if (Character.isLetter(ch) && identCharCount > MIN_IDENT_LEN) {
    currentToken.append(ch);
}

if (Character.isWhitespace(char ch) {
    tokens.add(new Token(tokenBuffer.toString(), TokenType.IDENT));
    currentToken = new StringBuilder();
}
The list of ifs went on and on. Each of them did something distinct and I wanted to break the method down into a set of meaningful sub-methods, but how? The typical pattern to use is to extract the bodies of the if-statements into separate methods, provided you can find good names for them. However, when I imagined what the code would look like after that refactoring, I didn’t really like it. It was still rather wordy:

if (Character.isLetter(ch) && identCharCount > MIN_IDENT_LEN) {
    appendToCurrentToken(ch);
}

if (Character.isWhitespace(char ch) {
    addToken(currentToken.toString(), TokenType.IDENT);
}
So, I thought that I’d extract each of the if-statements into its own method:

void onIdentChar(char ch) {
    if (Character.isLetter(ch) 
            && identCharCount > MIN_IDENT_LEN) {
        currentToken.append(ch);
    }    
}

void onWhitespace(char ch) {
    if (Character.isWhitespace(char ch) {
        tokens.add(new Token(tokenBuffer.toString(), 
                             TokenType.IDENT));
        currentToken = new StringBuilder();
    }
}
The calling code would certainly look cleaner:

    onIdentChar(ch);
    onWhitespace(ch);

The only problem is that the method names are a bit of a lie. When we use an “on” prefix we’re writing in event style. We expect that a method like onWhitespace is called only when we have whitespace. It shouldn’t have to check for it.

I’ve run into this situation before. How do you name a function which consists of a single if-block? The situation crops up frequently in legacy code. You want to break up a large method into several smaller ones and you’re left with a dilemma: do you extract the entire if-statement or just its body?

Here’s another example.


if (replyFlag == RE_SEND) {
    ...
    // code that sends a mail message
    ...
}

I once asked a friend what he’d name a method for that if-block if we extracted it and he said “sendEmailIfNeeded.” We both flinched and moved on to other things, but let’s think about that name a bit. Part of it talks about intention and part of it talks about implementation. Maybe we can pull the implementation bit out and concentrate on the condition:


if (needResend()) {
    ...
    // code that sends a mail message
    ...
}

Better. Now, how about extracting it like this?


ifNeedsResend();

It’s sort of like the event style I wrote about earlier. The method is a handler for a condition and the condition is checked in the method. Violation of the Single-Responsiblity Principle? Perhaps, but I do know that I’m looking at some code now that has been clarified by this idiom.


public List<Token> getTokens() {
    onStart();
    for (int n = 0; n < text.length(); n++) {
        char ch = text.charAt(n);

        ifLetter(ch);
        ifWhitespace(ch);
        ifLeftParen(ch);
        ifRightParen(ch);
        ifVerticalBar(ch);
        ifColon(ch);
        ifDigit(ch);
    }
    onFinish();
    return tokens;
}
What do you think?

Type Scum 7

Posted by Michael Feathers Mon, 08 Sep 2008 17:43:00 GMT

Getting existing code under test is hard work, but it is fruitful. Yes, you get code that is easier to change, but more importantly, you get knowledge – you learn things about programming which make you better at avoiding common traps. Sadly, many of these traps aren’t well recognized yet.

The trap that I am going to write about today is one that I call type scum. It’s most prevalent in C and C++ but it can attack in any of the traditional statically typed languages.

Type Scum is the cruft in a code base which makes it impossible to compile a single file without an entire sub-stratum of defined types. I’m not talking about the primary, or even the secondary abstractions in your system, but rather the 200 or so basic types and structs that your abstractions depend upon.

Again, the problem is worst in C++ and C. At some point, every C or C++ developer feels the urge to isolate him or herself from the basic types of the language. The unsigned int type becomes uint32 and unsigned char * becomes uchar16_ptr. And, if that was all, it would be okay. But no, people define data transfer objects which aggregate these type pseudonyms together into large muddles. No file can compile without bringing in a world of types which cushion the code from dangerous things like the platform and testing harnesses.

No wait, testing harnesses are good. What can we do?

The unfortunate thing is that it is very hard to pull type scum out of a system once it’s been infected, but we can learn how to avoid it or at least manage it better:

  1. If you must provide a sub-stratum of basic types in your system, do it in one place. There should be a single library (and associated headers) that you include whenever you need it. This library (and headers) should contain nothing else.
  2. If you must create DTO (Data Transfer Object) types, minimize them. A good general purpose structure can carry a wide variety of different types of data and simplify testing.
  3. Push the DTOs to the edges. There are some systems where you really do care whether an internal computation happens in unsigned long int or unsigned long long int but they are rare. Basic data types and tolerances matter when two systems need to agree upon them, and that happens at component boundaries. In many systems, the internal code can use platform types directly.

There you go. Type scum bad.

I’m sure that some people reading this will say “Hey, isn’t this the exact opposite of the advice that people give for a system with the primitive obsession code smell?” The answer is “yes.” But, to me, primitive obsession is a different problem. It’s something which is the result of a lack of real behavioral abstractions in a system, not the lack of larger data holders.

Different problem.

Type scum bad.