Type Scum 7
Getting existing code under test is hard work, but it is fruitful. Yes, you get code that is easier to change, but more importantly, you get knowledge – you learn things about programming which make you better at avoiding common traps. Sadly, many of these traps aren’t well recognized yet.
The trap that I am going to write about today is one that I call type scum. It’s most prevalent in C and C++ but it can attack in any of the traditional statically typed languages.
Type Scum is the cruft in a code base which makes it impossible to compile a single file without an entire sub-stratum of defined types. I’m not talking about the primary, or even the secondary abstractions in your system, but rather the 200 or so basic types and structs that your abstractions depend upon.
Again, the problem is worst in C++ and C. At some point, every C or C++ developer feels the urge to isolate him or herself from the basic types of the language. The unsigned int type becomes uint32 and unsigned char * becomes uchar16_ptr. And, if that was all, it would be okay. But no, people define data transfer objects which aggregate these type pseudonyms together into large muddles. No file can compile without bringing in a world of types which cushion the code from dangerous things like the platform and testing harnesses.
No wait, testing harnesses are good. What can we do?
The unfortunate thing is that it is very hard to pull type scum out of a system once it’s been infected, but we can learn how to avoid it or at least manage it better:
- If you must provide a sub-stratum of basic types in your system, do it in one place. There should be a single library (and associated headers) that you include whenever you need it. This library (and headers) should contain nothing else.
- If you must create DTO (Data Transfer Object) types, minimize them. A good general purpose structure can carry a wide variety of different types of data and simplify testing.
- Push the DTOs to the edges. There are some systems where you really do care whether an internal computation happens in unsigned long int or unsigned long long int but they are rare. Basic data types and tolerances matter when two systems need to agree upon them, and that happens at component boundaries. In many systems, the internal code can use platform types directly.
There you go. Type scum bad.
I’m sure that some people reading this will say “Hey, isn’t this the exact opposite of the advice that people give for a system with the primitive obsession code smell?” The answer is “yes.” But, to me, primitive obsession is a different problem. It’s something which is the result of a lack of real behavioral abstractions in a system, not the lack of larger data holders.
Different problem.
Type scum bad.
