Type Scum 7

Posted by Michael Feathers Mon, 08 Sep 2008 22:43:00 GMT

Getting existing code under test is hard work, but it is fruitful. Yes, you get code that is easier to change, but more importantly, you get knowledge – you learn things about programming which make you better at avoiding common traps. Sadly, many of these traps aren’t well recognized yet.

The trap that I am going to write about today is one that I call type scum. It’s most prevalent in C and C++ but it can attack in any of the traditional statically typed languages.

Type Scum is the cruft in a code base which makes it impossible to compile a single file without an entire sub-stratum of defined types. I’m not talking about the primary, or even the secondary abstractions in your system, but rather the 200 or so basic types and structs that your abstractions depend upon.

Again, the problem is worst in C++ and C. At some point, every C or C++ developer feels the urge to isolate him or herself from the basic types of the language. The unsigned int type becomes uint32 and unsigned char * becomes uchar16_ptr. And, if that was all, it would be okay. But no, people define data transfer objects which aggregate these type pseudonyms together into large muddles. No file can compile without bringing in a world of types which cushion the code from dangerous things like the platform and testing harnesses.

No wait, testing harnesses are good. What can we do?

The unfortunate thing is that it is very hard to pull type scum out of a system once it’s been infected, but we can learn how to avoid it or at least manage it better:

  1. If you must provide a sub-stratum of basic types in your system, do it in one place. There should be a single library (and associated headers) that you include whenever you need it. This library (and headers) should contain nothing else.
  2. If you must create DTO (Data Transfer Object) types, minimize them. A good general purpose structure can carry a wide variety of different types of data and simplify testing.
  3. Push the DTOs to the edges. There are some systems where you really do care whether an internal computation happens in unsigned long int or unsigned long long int but they are rare. Basic data types and tolerances matter when two systems need to agree upon them, and that happens at component boundaries. In many systems, the internal code can use platform types directly.

There you go. Type scum bad.

I’m sure that some people reading this will say “Hey, isn’t this the exact opposite of the advice that people give for a system with the primitive obsession code smell?” The answer is “yes.” But, to me, primitive obsession is a different problem. It’s something which is the result of a lack of real behavioral abstractions in a system, not the lack of larger data holders.

Different problem.

Type scum bad.

Comments

Leave a response

  1. Avatar
    Phil about 2 hours later:

    You sound like you’ve been programming in C+, where lots of types get defined in a single file. Since C+ dependencies propagate on a file-by-file basis a sprinkling of typedefs and DTO definitions can cause a huge snarl of interdependencies.

    If that’s the case then your first suggestion (isolate the supra-primitive types to their own library) would solve the whole problem wouldn’t it? You’d still have to be very careful about changing that library of course, because the whole project would have to be rebuilt every time you touched it.

    My personal opinion is that it was a mistake to include polymorphic primitive types in programming languages in the first place. Then you wouldn’t have everyone rolling their own uint32’s all over the place. It would be more appropriate for people to roll their own in the rare cases where they actually want the meaning of “int” to be ambiguous.

  2. Avatar
    Antti Tuppurainen about 14 hours later:

    http://www.opengroup.org/onlinepubs/000095399/basedefs/stdint.h.html

    stdint.h is C99, will be in the next C++ standard (as cstdint), is supported by most C and C++ implementations (even if they don’t support C99) and for those that don’t, portable implementations are available from various sources; there are some floating around that can be found with a google search, there’s one in boost, etc etc.

    People should always be using stdint.h (or cstdint) when writing new C or C++ code.

  3. Avatar
    Anthony Williams about 15 hours later:

    I agree that a proliferation of “basic” types defined in random places can be a pain to deal with.

    My guideline is that if you need a type in two (or more) places it should be defined somewhere where it is accessible to those places without circular dependencies. This may often mean its own header file or module or unit or whatever. A library of basic types fits the bill nicely in this case. As Phil says, this solves most of the problems with “Type Scum”.

    A whole header just for the definition of uint32_t is probably overkill, but you need to use a bit of thought and avoid a single all-encompassing header for all types.

    It’s all about Coupling and Cohesion.

  4. Avatar
    Michael Feathers 1 day later:

    @Antti Yes, I remember hearing about that. Hopefully new projects will use it. In most existing projects, the madness has already set in.

    @Anthony I agree that consolidation can solve many of the problems, but I think the problem has a couple of faces. It is sort of like the use of what are called “way of life” frameworks in Java. The framework touches everything and it is hard to compile anything without linking to the framework. The Java community reacted with the notion of POJOs (Plain old Java objects). Although, I’ve seen the term POCO coined for C++, I haven’t seen much of a movement to champion their use.

    Some projects will always have that layer of pseudonyms for primitive types, but beyond that, I think that POCO-hood is a good thing.

  5. Avatar
    Sebastian Kübeck 1 day later:

    What can I do if third party includes force me to have lots of different definitions of primitive types around? Examples would be the Java Native Interface and OpenGL.

  6. Avatar
    Michael Feathers 1 day later:

    @Sebastian I would try to isolate myself as much as possible. Let those types play at the boundary. As soon as you step away from the boundary, use native types if you can. This doesn’t work for all systems, but I’m hard pressed to think of cases where it wouldn’t work for applications that use JNI and OpenGL.

  7. Avatar
    Josh Stone 1 day later:

    You didn’t mention the other property of type-scum—it shows up in all your code, because when you have distinct sets of type-scum, you have to convert them. So you end up casting left and right so you can get rid of all the compiler warnings.

Comments