Python's Mutable Default Problem 201
Today I was perusing a very fine python article all professional python programmers should read. I had to grin when I saw one of my favorite python quirks/flaws.
Python only evaluates default parameters when it defines a function, leading to surprising side effects (example from saifoo):def function(item, stuff = []): stuff.append(item) print stuff function(1) # prints '[1]' function(2) # prints '[1,2]' !!!
The grin on my face was because I was bit by this one again last Tuesday. You just don’t want to make a mutable object a default parameter. The solution I used was close to the solution from saifoo:
Saifoo.net:def function(item, stuff=None): if stuff is None: stuff = [] stuff.append(item) print stuffThis, of course, gives the correct behavior because it creates a new empty list each time it is called without a second parameter.
My solution actually combines some advice contained earlier in the article with the solution given for this problem:
Minedef function(item, stuff=None): stuff = stuff or [] stuff.append(item) print stuff
It’s not necessarily prettier, and I struggle with whether it is more obvious (to a Python programmer) or not. I fear it may be clever (a word I only use in the pejorative sense), but it is also clean-looking to me.
The point remains, though, that the mutable default parameter quirk is an ugly corner worth avoiding, and I suspect that any code that purposefully exploits that behavior to populate a structure that persists across calls will run a very real risk of being misunderstood.
We all know what happens when the next programmer to touch a program misunderstands it.
Revisit: The common subgroups 23
In cleaning up the code, I simplified the algorithm a very little and improved performance considerably. Amazing how that works, how simpler equals faster for so much code. Adding simple data structures, local explanatory functions, and the like often make code much faster.
What I’m hoping is that I will use this in a few different and useful ways.
The first way is to look for interfaces where concrete classes are being used from many other classes. You need to add an interface, but don’t know who needs which part. The goal is to figure out a relatively small number of interfaces that satisfy a number of clients in a module.
The second use would be to look for common clumps of parameters when I’m working in a large code base where the average number of arguments per function call does not remotely approach one. I suspect that there are clumps of similarly-named variables being passed around, and that these are likely “missed classes”. Sometimes these are obvious, but it would be good to see them in a nice list spit out from a nice tool.
So this is a hopeful start on a series of useful tools.
Code follows.
import shelve import sys def find_groups(input): """ Exhaustively searches for grouping of items in a map, such that an input map like this: "first":[1, 2, 3, 4], "second":[1,2,3,5,6], "third":[1,2,5,6] will result in: [1,2,3]: ["first","second"] [1,2]: ["first","second","third"] [5,6]: ["second","third"] Note that the return value dict is a mapping of frozensets to sets, not lists to lists as given above. Also, being a dict, the results are effectively unordered. """ def tupleize(data): "Convert a set or frozenset or list to a tuple with predictable order" return tuple(sorted(set(data))) def append_values(map, key, *values): key=tupleize(key) old_value = map.get(key,[]) new_value = list(old_value) + list(values) new_value = tupleize(new_value) map[key] = new_value return key, new_value result = {} previously_seen = {} for input_identity, signatures in input.iteritems(): input_signatures = set(signatures) for signature_seen, identities_seen in previously_seen.iteritems(): common_signatures = set(signature_seen).intersection(input_signatures) if len(common_signatures) > 1: known_users = list(identities_seen) + [input_identity] append_values(result, common_signatures, *known_users) append_values(previously_seen, signatures, input_identity) return filter(result) def filter(subsets): filtered = {} for key,value in subsets.iteritems(): if (len(key) > 1) and (len(value) > 1): filtered[key] = set(value) return filtered def display_groupings(groupings): "Silly helper function to print groupings" keys = sorted(groupings.keys(), cmp=lambda x,y: cmp(len(x),len(y))) for key in keys: print "\n","-"*40 for item in key: print item for item in sorted(groupings[key]): print " ",item print