Python's Mutable Default Problem 201

Posted by tottinger Fri, 23 May 2008 03:16:00 GMT

Today I was perusing a very fine python article all professional python programmers should read. I had to grin when I saw one of my favorite python quirks/flaws.

Python only evaluates default parameters when it defines a function, leading to surprising side effects (example from saifoo):
def function(item, stuff = []):
    stuff.append(item)
    print stuff

function(1)
# prints '[1]'

function(2)
# prints '[1,2]' !!!

The grin on my face was because I was bit by this one again last Tuesday. You just don’t want to make a mutable object a default parameter. The solution I used was close to the solution from saifoo:

Saifoo.net:
def function(item, stuff=None):
   if stuff is None:
      stuff = []
   stuff.append(item)
   print stuff
This, of course, gives the correct behavior because it creates a new empty list each time it is called without a second parameter.

My solution actually combines some advice contained earlier in the article with the solution given for this problem:

Mine
def function(item, stuff=None):
   stuff = stuff or []
   stuff.append(item)
   print stuff

It’s not necessarily prettier, and I struggle with whether it is more obvious (to a Python programmer) or not. I fear it may be clever (a word I only use in the pejorative sense), but it is also clean-looking to me.

The point remains, though, that the mutable default parameter quirk is an ugly corner worth avoiding, and I suspect that any code that purposefully exploits that behavior to populate a structure that persists across calls will run a very real risk of being misunderstood.

We all know what happens when the next programmer to touch a program misunderstands it.

Revisit: The common subgroups 23

Posted by tottinger Tue, 03 Jul 2007 15:43:00 GMT

In cleaning up the code, I simplified the algorithm a very little and improved performance considerably. Amazing how that works, how simpler equals faster for so much code. Adding simple data structures, local explanatory functions, and the like often make code much faster.

What I’m hoping is that I will use this in a few different and useful ways.

The first way is to look for interfaces where concrete classes are being used from many other classes. You need to add an interface, but don’t know who needs which part. The goal is to figure out a relatively small number of interfaces that satisfy a number of clients in a module.

The second use would be to look for common clumps of parameters when I’m working in a large code base where the average number of arguments per function call does not remotely approach one. I suspect that there are clumps of similarly-named variables being passed around, and that these are likely “missed classes”. Sometimes these are obvious, but it would be good to see them in a nice list spit out from a nice tool.

So this is a hopeful start on a series of useful tools.

Code follows.

import shelve
import sys

def find_groups(input):
    """ 
    Exhaustively searches for grouping of items in a map, such
    that an input map like this:
        "first":[1, 2, 3, 4],
        "second":[1,2,3,5,6],
        "third":[1,2,5,6]
    will result in:
        [1,2,3]: ["first","second"]
        [1,2]: ["first","second","third"]
        [5,6]: ["second","third"]

    Note that the return value dict is a mapping of frozensets to sets,
    not lists to lists as given above. Also, being a dict, the results
    are effectively unordered.
    """ 
    def tupleize(data):
        "Convert a set or frozenset or list to a tuple with predictable order" 
        return tuple(sorted(set(data)))

    def append_values(map, key, *values):
        key=tupleize(key)
        old_value = map.get(key,[])
        new_value = list(old_value) + list(values)
        new_value = tupleize(new_value)
        map[key] = new_value
        return key, new_value

    result = {}
    previously_seen = {}
    for input_identity, signatures in input.iteritems():
        input_signatures = set(signatures)
        for signature_seen, identities_seen in previously_seen.iteritems():
            common_signatures = set(signature_seen).intersection(input_signatures)
            if len(common_signatures) > 1:
                known_users = list(identities_seen) + [input_identity]
                append_values(result, common_signatures, *known_users)
        append_values(previously_seen, signatures, input_identity)
    return filter(result)

def filter(subsets):
    filtered = {}
    for key,value in subsets.iteritems():
        if (len(key) > 1) and (len(value) > 1):
            filtered[key] = set(value)
    return filtered

def display_groupings(groupings):
    "Silly helper function to print groupings" 
    keys = sorted(groupings.keys(), cmp=lambda x,y: cmp(len(x),len(y)))
    for key in keys:
        print "\n","-"*40
        for item in key:
            print item
        for item in sorted(groupings[key]):
            print "     ",item
        print