Your code is not flat 11
There are three possibilities for the universe. On the one hand it is Closed. There’s enough matter that the universe, currently expanding, will eventually stop expanding and collapse in on itself.
The universe might be open. There’s not enough matter for gravitational forces to bring the universe back in on itself.
The final option is that the universe is flat. There’s just enough mass that it will stop expanding but it will not collapse in on itself.
Assuming an infinite number of universes, the final option, the flat option, must surly exist, somewhere. Probably not here, but somewhere. So for all intents and purposes, our universe is likely open or closed but not flat.
While you were reading that, your code rotted just a little. You’ve heard of it, bit rot. The bits in a program sitting on a disk rot. It’s a well known fact. Sure the magnetic surface could give out or if it’s in memory, a stray cosmic ray could flip a bit.
But that’s not what I’m talking about. You know the experience. It was just working and now it is not working. You did not change anything so therefore it is the same as it was and so something else must have changed. This is bit rot.
Typically, bit rot is really just our expectations being crushed by reality. Never let a fact get in the way of a good theory, that’s my motto. This works especially well if you tend to be a better talker than listener (guilty).
At any rate, your code continues to rot as you continue to read this. Why? Someone using it has managed to find another thing programmers did not think about. So what was “working” before is not “working” now. Wait, was it really working? If someone was getting value out of using it then yes it was working. Is it working now? If nothing about the system has changed but it no longer serves a purpose, then it is not working.
Here’s a question. If a tree falls in the forest and nobody hears it, does it make a sound? Answer from a human perspective, no. If there’s nobody to hear it, then it didn’t make a sound. Sure it moved some air about and you could take an infinite sum of sine waves to characterize that air movement but then if you characterize it then something was there to observe it and that seems to violate the spirit of “nobody hears it.”
What about if there’s a problem in the code and nobody hits it, does your code have a defect? Using the tree as an example, then the answer is no. However, unlike the tree’s sound, which quickly dissipates (if for no other reason that then laws of thermodynamics), your code tends to stay around a bit longer so the chance for observation of a sleeping defect is higher (though to be fair, if you wait long enough your code, like the sound in the forest, will go away).
OK, but it is even worse than that. Things change. If your body is not changing then you are dead. That’s a fact. In one study by Dr. Frisen, he demonstrated the age of cells in a rib bone of a 30 year old to be just over 15 years, while the cells that line the stomach to be closer to 5 days. Your body is constantly fixing itself. Cells replace themselves. That’s why you can donate platelets every 3 days and blood every 56 days.
What about your code? Is it fixing itself? Is it repairing itself? Does it need to? If your code is not changing, then the project is dead. Sure, it takes some time for all usefulness to finally wear out of the system but at some point the system will essentially bit rot so badly that is serves no useful purpose. (While there are multiple clinical definitions of “dead” for a body, some life remains long after you are clinically dead – your fingernails will keep growing for some time after you expire.)
There are several things that cause your system to decay. Here are just a few:- What the business needs changed so until the system catches up it has less value
- The act of introducing the system has changed the business so that the system needs to change
- Someone misunderstood something (it’s actually no small miracle that people can communicate at all – my wife would argue she cannot get through to me)
- Someone actually changed some code badly (most 1-line bug fixes introduce new defects – see Weinberg)
- Someone is compelled to hack it in to meet the deadline/demo/beat the lunch crowd/...
Your code is rotting while you read this because the environment around it is changing. Add to that an observation that most changes to code introduce rather than reduce chaos and it’s no wonder your system is either closed (code collapses in on itself, weighed down by its own incidental complexity) or open (nobody can decide what to do and the project eventually sort of fizzles into nothingness).
So what are you doing to actively maintain your system’s integrity? Are you just using antibiotics (patching things quickly hoping that there’s no super bug on the horizon – there’s always a super bug on the horizon no matter how strong an alligator’s immune system might be), or are you doing what the surgeon general has been saying for years: getting enough sleep, working out regularly, eat healthily (there’s a moving target).
Your code is rotting every day in every way unless you are actively working against that tide. Talking to your users – being friendly even, writing tests (acceptance, integration, smoke, unit, load, exploratory, ... mostly automated), integrating often, refactoring, learning to see beginning of rot rather than the end of it, etc.
There are a lot of things we can do and need to do to make a system. Part of that is keeping the system alive and breathing. Unlike biological systems where nature (and natural selection) has resulted in a self-monitoring, self-healing systems, code does not really monitor itself and fix itself (well most code does not do that).
Maybe there’s been a hidden force driving the creation of these things we called code maintainers whose existence is predicated on the need to repair living code to keep it around just a bit longer.
If this is the case, then just like the waist lines of americans are increasing as our biological systems have not had enough time to adjust to the changing environment, our code bases are bloating and getting to the point where code maintainers cannot keep the systems living long enough.
It seems a shift is in order to make it possible to keep these living, but often very sick, applications alive longer. In a sense, if you are practicing keeping your code clean, you are like a doctor because you diagnosing sickness, prescribing a path to heath and, if necessary, making incisions, using slugs or just waiting for a given sickness to simply follow its natural course before going away (like the requirement that simply must be done right away, just because).
Unfortunately, if we are calling ourselves doctors, then we’re still learning the value of keeping our hands clean between procedures. Unlike doctors of the mid-19th century who thought more blood on clothes = more experience, Ignaz Semmelweis figured out that hand washing saved lives.
As a community we’re not there yet. There’s still pride is hacking something together.
Your code is getting worse, what are you going to do about it?
Revisit: The common subgroups 20
In cleaning up the code, I simplified the algorithm a very little and improved performance considerably. Amazing how that works, how simpler equals faster for so much code. Adding simple data structures, local explanatory functions, and the like often make code much faster.
What I’m hoping is that I will use this in a few different and useful ways.
The first way is to look for interfaces where concrete classes are being used from many other classes. You need to add an interface, but don’t know who needs which part. The goal is to figure out a relatively small number of interfaces that satisfy a number of clients in a module.
The second use would be to look for common clumps of parameters when I’m working in a large code base where the average number of arguments per function call does not remotely approach one. I suspect that there are clumps of similarly-named variables being passed around, and that these are likely “missed classes”. Sometimes these are obvious, but it would be good to see them in a nice list spit out from a nice tool.
So this is a hopeful start on a series of useful tools.
Code follows.
import shelve import sys def find_groups(input): """ Exhaustively searches for grouping of items in a map, such that an input map like this: "first":[1, 2, 3, 4], "second":[1,2,3,5,6], "third":[1,2,5,6] will result in: [1,2,3]: ["first","second"] [1,2]: ["first","second","third"] [5,6]: ["second","third"] Note that the return value dict is a mapping of frozensets to sets, not lists to lists as given above. Also, being a dict, the results are effectively unordered. """ def tupleize(data): "Convert a set or frozenset or list to a tuple with predictable order" return tuple(sorted(set(data))) def append_values(map, key, *values): key=tupleize(key) old_value = map.get(key,[]) new_value = list(old_value) + list(values) new_value = tupleize(new_value) map[key] = new_value return key, new_value result = {} previously_seen = {} for input_identity, signatures in input.iteritems(): input_signatures = set(signatures) for signature_seen, identities_seen in previously_seen.iteritems(): common_signatures = set(signature_seen).intersection(input_signatures) if len(common_signatures) > 1: known_users = list(identities_seen) + [input_identity] append_values(result, common_signatures, *known_users) append_values(previously_seen, signatures, input_identity) return filter(result) def filter(subsets): filtered = {} for key,value in subsets.iteritems(): if (len(key) > 1) and (len(value) > 1): filtered[key] = set(value) return filtered def display_groupings(groupings): "Silly helper function to print groupings" keys = sorted(groupings.keys(), cmp=lambda x,y: cmp(len(x),len(y))) for key in keys: print "\n","-"*40 for item in key: print item for item in sorted(groupings[key]): print " ",item print