category tims-tepid-torrent

Zero 15

Posted by tottinger Thu, 22 Mar 2007 17:23:00 GMT

Clarity of code is a signal-to-noise ratio. We need our code to be as concisely meaningful as possible (not as cramped and crowded as possible, and not unnecessarily spread out over many pages). Whatever size provides the most accurate and quickest reading for trouble-shooting and enhancement is the best size.

On one of my past projects, we found ourselves able to remove a lot of dead code and as we added new features we were also able to simplify a lot of existing code. This is a very good thing, since code is a liability and functionality is an asset (I’ll elaborate more on that at a later date).

However, my manager was secretly reporting our progress in terms of lines of code/week. I was called on the carpet after some very productive weeks of producing unusually clean code. My manager’s problem was that we had been averaging over 200 lines less per week. By making the system cleaner and smaller, we were being reported as making negative progress, and were ordered to stop!

It’s just another “bad metrics” story, but it underscores an unspoken assumption that code should always grow bigger over time.

One non-zero-respecting practice is that of commenting-out code and leaving it. people have to scroll around the code, reading bits above and below, and so it is really just a speed bump for those who are trying to learn. It’s worse if the code is inside some nested level of a function than if an entire function is commented out. This is unnecessary, because a version control system holds the old code and can restore it if you really need it again (if not, consider a new VCS). Commented code is redundant with the VCS, in addition to being a nuisance.

Zero isn’t all really about comments, but about noise in general. Systems often have ‘dead code’, code which is never used but never deleted. When there are insufficient tests, one is unsure of coverage and cannot tell if a piece of code is being used or not. That means that code with zero use may have non-zero occurrences. In such systems, it can be very dangerous to remove code at all. Since the rule of zero is not respected, code accumulates.

Zero also will have us remove abstractions and interfaces when we no longer need them.

Zero tells us to remove tests when they’re no longer valuable.

Zero is a good idea.

I just located an old Jeff Atwood quote that lights up a few of the same synapses.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 15 comments, permalink, rss, atom

One 17

Posted by tottinger Wed, 21 Mar 2007 14:17:00 GMT

I posted earlier about the three numbers of software design (zero, one, many). Zero and Many are numbers we see pretty often, but the rise of TDD has made “one” just a little bit lonelier. I wanted to say that one is the loneliest number, but David Chelimsky beat me to it several months ago, and I would just feel like a copycat.

When we’re test-driving an application into existence, we tend to create mock objects. The presence of one mock object in addition to one real object places is firmly in the design space of “many”, and so interfaces become the rule of the day.

In addition, we have dependency management to deal with. In order to reduce the D metric on a package, we find ourselves often breaking a class into an interface (increasing A) and an implementation that depends upon it (increasing I) in a separate package.

Of course, the “other question” is volatility. If we have one implementation of something small and simple in a domain that is very stable then we don’t need any interface and “one” takes center stage. This is true for whole value objects (fowler) in general. If the class is more complex, or dependent on special hardware or changing algorithms, then we end up creating a mock (driving us back to “many”).

But the number one is becoming more and more rare in our design world.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 17 comments, permalink, rss, atom

The Three Numbers 13

Posted by tottinger Tue, 20 Mar 2007 16:27:00 GMT

There are only three numbers that are truly meaningful in software design. The numbers are the same as the numbers in relational database design: zero, one, and many.

Zero: if a condition, circumstance, requirement does not exist, then no code should exist (commented or uncommented) to deal with it. This is true for requirements we haven’t seen (future proofing), but is also true for requirements which have some and gone. When a test, function, variable, or class has zero uses, it must be deleted. Recognizing the power of zero helps us keep our code clean.

One: A condition or class, event or circumstance, role or behavior exists, and so we must code for it. There is a limit on how much insulation and abstraction is needed. We can keep things very simple. Moving from zero to one (adding a requirement) is the typical operation, but moving from one to zero also happens.

Many: More than one is “many”. You might as well consider adding a list and/or interfaces. In design, some things repeat and some do not. Those that repeat are likely to continue repeating. Therefore, many is represented first by the value of two. Where there are many, there needs to be some insulation. The open-closed principle tells us that we should open the system for extension. The usual case in design is moving from one to many, but sometimes one will move in the opposite direction by removing interfaces or lists.

The transitions in either direction are important design events.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 13 comments, permalink, rss, atom

It's All Iterative 8

Posted by tottinger Fri, 16 Mar 2007 14:38:55 GMT

People say “we don’t do iterative”. Sometimes it is a prideful thing, more often a kind of self-effacing confession. However, you and I know that it’s not true. All software development is iterative, it’s a matter of granularity.

When working on the very first 1.0 release of a product, developers and managers may think that they’re not doing iterative work, but any idiot knows that they’re not writing all the code at once. They couldn’t possibly. They’re not releasing yet, but they’re appending functionality every day.

They try to design all at once, but that’s just for 1.0. Software is like a plan: it never survives contact with the real world unscathed.

After 1.0 do we cancel the project? If so, it’s a failure. After 1.0 is 1.0.1 or 1.1 or 2.0. The project continues and continues to receive new appendages and improvements. Isn’t the whole idea of “next release” an iteration?

Since you’re really iterating anyway, why not capture the data that iterating produces, and why not take advantage of the integration and testing opportunities that iterating will give us? And why not have useful feedback loops? And why not measure our progress as we go? And for that matter, why not go for smaller measured periods so we can judge trends more reliably?

When we say “it isn’t an incremental and iterative effort” we are really saying that we’re denying a basic truth, and choosing not to use the data it produces for us. Not intentionally, but almost certainly.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 8 comments, permalink, rss, atom

Outliving The Great Variable Shortage 34

Posted by tottinger Tue, 27 Feb 2007 05:07:00 GMT

One of the more annoying problems in code, confounding readability and maintainability, frustrating test-writing, is that of the multidomain variable.

I suppose somebody forgot to clue me in to the Great Variable Shortage that is coming. I have seen people recycling variables to mean different things at different times in the program or different states of the containing object. I’ve witnessed magic negative values in variables that normally would contain a count (sometimes as indicators that there is no count, much as a NULL/nil/None would). I might be willing to tolerate this in programming languages where a null value is not present.

Yet I have seen some code spring from multidomain variables that made the code rather less than obvious. I think that it can be a much worse problem than tuple madness.

I have a rule that I stand by in OO and in database design, and that is that a variable should have a single, reasonable domain. It is either a flag or a counter. It is either an indicator or a measurement. It is never both, depending on the state of something else.

It is sort of like applying Curly’s law (Single Responsibility Principle) to variables. A variable should mean one thing, and one thing only. It should not mean one thing in one circumstance, and carry a different value from a different domain some other time. It should not mean two things at once. It must not be both a floor polish and a dessert topping. It should mean One Thing, and should mean it all of the time.

Surely I’ll take a shot from someone citing the value of a reduced footprint, and I won’t argue very long about the value of usinge only as much memory as you must. In C++ I was a happy advocate of bitfields. I haven’t been overly vocal about using various data-packing schemes, but I think that it can be a reasonable choice for compressing many values into a smaller space, but I will maintain that multipurpose variables are a bad idea, and will damage the readability of any body of code.

I suggest, in the case of constrained footprint where there truly is a Great Variable Shortage (GVS) that if (and that’s a big if) the author absolutely MUST repurpose variables on the fly that it is the lot of that programmer to make sure that users of the class/struct never have to know that it is being done. Never. Including those writing unit tests. The class will have to keep data-packing and variable-repurposing as a dirty secret.

Perhaps a strong statement or two in rule-form should be made here:

A variable should have a SINGLE DOMAIN.
One must NEVER GET CAUGHT repurposing a variable.

We don’t have to make do with the smallest number of variable names possible. Try to learn to live in plenty: use all the variables you want. For the sake of readability, consider having a single purpose for every variable not only at a given point in time, but for the entire program you’re writing.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 34 comments, permalink, rss, atom

PySweeper: Un-refactoring, Tuple Madness, and Injection 10

Posted by tottinger Sat, 17 Feb 2007 19:02:00 GMT

I figure that to understand the code before me, I need to start testing. I’ll start with the constructor. The constructor actually is doing a lot of work. In order to test out my refactoring tools, I recklessly broke out two routines without adding any testing. I don’t recommend that on any project that needs to work, but I also don’t recommend using a real product to learn your tools.

Today, the constructor looks something like this:

    def __init__( self, rows = 16, cols = 16, mines = 10): # 40 ):
        """Initialize the playing field.

        This function creates a playing field of the given size, and randomly
        places mines within it.

        rows and cols are the numbers of rows and columns of the playing  field, respectively.  
        mines is the number of mines to be placed within the field.
        """ 
        self._validate_parameters( rows, cols, mines )

        self.rows = rows
        self.cols = cols
        self.mines = mines
        self.cleared = 0
        self.flags = 0
        self.start_time = None

        minelist = []
        self.freecoords = {}
        for col in range( cols ):
            self.freecoords[col] = range( rows )

        self._place_mines( mines, minelist )

        self.board = []
        for col in range( cols ):
            self.board.append( [( -2, 0 )] * rows )
            for row in range( rows ):
                if ( row, col ) in minelist:
                    self.board[col][row] = ( -1, 0 )

    def _validate_parameters( self, rows, cols, mines ):
        for var in ( rows, cols, mines ):
            if var < 0:
                raise ValueError, "all arguments must be > 0" 
        if mines >= ( rows * cols ):
            raise ValueError, "mines must be < (rows * cols)" 

    def _place_mines( self, mines, minelist ):
        while mines > 0:
            y = random.choice( self.freecoords.keys() )
            x = random.randrange( len( self.freecoords[y] ) )
            minelist.append( ( self.freecoords[y][x], y ) )
            del self.freecoords[y][x]
            if not self.freecoords[y]:
                del self.freecoords[y]
            mines = mines - 1

I feel a little guilty for breaking out functions without testing them. I start to write tests for _validate_parameters, but I cannot call it in isolation. I have to have an instance of Field to call it, and it is also called in Field’s constructor. I can’t test weird values as easily as I’d like. I need something like a static method (c++/java). I add the @classmethod decorator and I can call it freely from my unit tests.

I remind the reader that the game runs and is playable, so the addition of tests should pretty much be a bit of pedantic formalism in preparation for refactoring. However, look at the code carefully and see if you can see a loophole:

    @classmethod
    def _validate_parameters( self, rows, cols, mines ):
        for var in ( rows, cols, mines ):
            if var < 0:
                raise ValueError, "all arguments must be > 0
        if mines >= ( rows * cols ):
            raise ValueError, "mines must be < (rows * cols)"

A perceptive reader will realize that the code is checking that the value is less than zero, but reporting that it must be greater than zero. I found this with tests before it jumped out at me visually.

I write and run several tests and then get a nag from PyDev. I guess somehow I missed the part where PyDev told me it was not free software. The nag messages tells me I’m in a trial period and I’m expected to eventually buy or quit. Bummer. PyDev is not expensive. I just have to decide if I want to afford it.

I am also playing with Eric3 which is a python-specific IDE with refactoring, and it seems to have better integration with PyUnit. It is not as nice in terms of automatic completion and seems to have trouble with the context menu for refactoring. I immediately miss the cool way that PyDev would stick the word “self” in so I didn’t have to type it.

In the last installment, I extracted a method called _place_mines but it’s a bit hard to test. It’s actually a very poor refactoring, done to test a tool and not really to improve the code. As is, it wants to muck about with an unspoken parameter.

    def _place_mines( self, mines, minelist ):
        while mines > 0:
            y = random.choice( self.freecoords.keys() )
            x = random.randrange( len( self.freecoords[y] ) )
            minelist.append( ( self.freecoords[y][x], y ) )
            del self.freecoords[y][x]
            if not self.freecoords[y]:
                del self.freecoords[y]
            mines = mines - 1

It’s small, but not obvious. Nor is it tested. It is using freecoords rather heavily. Breaking out that method was a mistake, and I put it back inline. It seems that freecords is a list of all the positions in the grid that do not contain mines. It is only used in the constructor and one other place. In the other place, it’s used and then deleted from the object.

The names of variables mines and minelist aren’t helpful because the names mean the same thing, even though the variables do not. They each mean more than one mine, but one is an integer count and the other is a list of the coordinates of the mines. That should change right away. I rename the integer to numberOfMines.

With the __init__ reconstituted, I notice that I don’t even keep the minelist. It’s discarded at the end of the __init__method. Maybe I can do this more simply.

I realize now that the code suffers from Tuple Madness. The cells are represented by two-tuples. The first and second item are not distinguished by a name, only by subscripts 0 and 1. There can hardly be anything less descriptive of the content. The initial values are (-2,0), but sometimes they are replaced with (-1,0). The code gives no obvious guidance to the significance of the first parameter versus the second parameter, and no real guide to the meaning of the integer-coded values. If one were going to document this code, it would be helpful to explain the use of anonymous data structures. Of course, that’s selfish because my goal is to obviate any comments I find. I just want it to be easy.

I dig in and discover that subscript zero really is the count of adjacent cells with mines. Since you can’t have less than zero mines, the number -2 tells us that we’ve not counted yet. The number -1 is a code telling us that the cell itself contains a mine. I’ll want to split these out later into named attributes.

Tuple Madness is such an easy trap to fall into with Python or Ruby (and I’m told Lisp as well). Each has such great native tuple and list types that we may forget to create real data structures for small data sets. I’ve done it, the PySweeper author fell into it, and so have my coworkers. If we had n-tuples in C, the resulting code would have been so inscrutible that the software practice might never have recovered. Yet two-tuples can be very good things. Just not here. I vow to create a data structure for the cell as soon as I have this body of code sufficiently under test.

As I learn about the two-tuples I write a couple of tests, as if there were more descriptive functions. I write calls to such functions as hasAMine() to check if subscript zero has a -1 value, and cellAt to get a cell at a given location, etc. Explanatory functions are a powerful documentation technique

    def cellAt(self, x, y):
        return self.board[x][y]

    def isOpened( self, x, y ):
        return self.cellAt(x, y)[1] == -1

    def countAt(self, x,y):
        result = self.cellAt(x,y)[0]
        if result < 0:  result = None  # indicator value
        return result

    def hasMine(self, x,y):
        return self.cellAt(x,y)[0] == -1

I have a small-but-growing body of tests. I should soon have enough confidence to do bigger refactorings.

Note the ugly comment in countAt. This is variable reuse. Rather than have a separate indicator (another element in the tuple maybe), one element is serving double-duty. It is too entrenched right now, so I’ll work on making it obvious and easier to separate later. Soon my body of explanatory methods will compose an opaque interface, and there will be little or no upset when I change the implementation of those methods.

Note also that we’re in violation of the SRP. The class has methods related to the field and also to specific cells in the board. Maybe I will eventually have classes for the cells, the game, and the board.

Time to stop reading and do more testing. I did learn enough to know that the game tries to be “fair”. If there is a mine on the very first square you select, it moves it out of the way. My “gaming the system” gene took over. I should be able to fill a three-by-three grid with eight mines, and then click (open) the center square. That should guarantee me that I have this structure:

mine	mine	mine
mine	empty	mine
mine	mine	mine

The more astute reader knows that this disregards the first rule of testing: Control The Environment. I should take control of the placement of mines rather than testing the system with random mine placement.

Sure enough, I find that the routine that moves the mine to a new spot is not working quite as expected. Now I have a failing test which allows me to make changes to the randomizer. I fix it, and then decide to pull the random chooser into a separate class. It will behave according to the iterator protocol in Python so I can pass a simple array of (col,row] pairs to the constructor as a testing stub/mock/thing.

Now the top of my file has the random cell chooser. There are some named constants, the still-busy __init__ method, the now-tested _validate_parameters method, and then other code we’ve either talked about already or haven’t touched yet.


random.seed()

class random_cell_chooser(object):
    def __init__(self, columns, rows):
        self.choices = [ (x,y) for x in range(columns) for y in range(rows) ]
    def next(self):
        if len(self.choices) == 0:
            raise StopIteration, "Out of free cells" 
        item = random.choice(self.choices)
        self.choices.remove(item)
        return item

INITIALLY_EMPTY = ( -2, 0 )
INITIALLY_MINED = ( -1, 0 )

class Field:
    """Provide a playing field for a Minesweeper game.

    This class internally represents a Minesweeper playing field, and provides
    all functions necessary for the basic manipulations used in the game.
    """ 
    def __init__( self, rows = 16, columns = 16, numberOfMines = 40, chooser=None):
        """Initialize the playing field.
        """ 

        self._validate_parameters( rows, columns, numberOfMines )

        self.rows = rows
        self.cols = columns
        self.mines = numberOfMines
        self.selector = chooser or random_cell_chooser(columns,rows)
        self.cleared = 0
        self.flags = 0
        self.start_time = None

        self.board = []
        for col in range( columns ):
            self.board.append( [INITIALLY_EMPTY] * rows )

        for counter in range(numberOfMines):
            (col,row) = self.selector.next()
            self.board[col][row] = INITIALLY_MINED

    @classmethod
    def _validate_parameters( self, rows, cols, mines ):
        for var in ( rows, cols, mines ):
            if var < 0:
                raise ValueError, "all arguments must be > 0" 
        if mines >= ( rows * cols ):
            raise ValueError, "mines must be < (rows * cols)"

An engineer friend of mine is fond of saying “suspenders and belt”, referring to having redundant safety mechanisms. I’m appreciating the value of both a good version control system and a growing set of tests. I am aware of the quality of the code I’m modifying (whether it passes all the tests) and that I can always abandon a change I’m making if it gets messy.

My little test is building that three-by-three grid and playing a very simple game to completion by opening the middle square. I am testing setup and a bit of gameplay. I’ll put a better test in tomorrow.

It is a simple thing to extract the method for placing the mines and pass it to the constructor for Field. Now I can set the mines just where I want them, giving me total control of the field constructor. I make it act like an iterator so my tests can pass a list iterator to the constructor:


class random_cell_chooser(object):
    def __init__(self, columns, rows):
        self.choices = [ (x,y) for x in range(columns) for y in range(rows) ]
    def next(self):
        if len(self.choices) == 0:
            raise StopIteration, "Out of free cells" 
        item = random.choice(self.choices)
        self.choices.remove(item)
        return item

Because the tests are new and few, I’m also playing (losing) a game of minesweeper every so often so that I know that it still beats me in reasonable ways. I can’t wait until I can depend entirely on automated tests. Maybe in this way, it would have been better to have some basic acceptance tests before working in unit tests. It’s a thought.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 10 comments, permalink, rss, atom

PySweeper: Python test collector 13

Posted by tottinger Sat, 17 Feb 2007 18:00:00 GMT

I wrote a number of separate test files for the pysweeper refactoring exercise, and very quickly got tired of running them all individually. I created a stupid test composite tool and I run this one instead.

Here’s what it is looks like right now.

import os
import unittest
import sys

def collect_tests(directoryName):
    """Gather all tests matching "test*.py" in the current directory""" 
    alltests = []
    for filename in os.listdir(directoryName):
        if filename.startswith('test') and filename.endswith('.py'):
            module = __import__(filename.replace(".py",""))
            for item_name in dir(module):
                item = getattr(module,item_name)
                if isinstance(item,type) and issubclass(item,unittest.TestCase):
                    test = unittest.makeSuite(item)
                    alltests.append(test)
    return alltests

def suite():
    "Collect all local tests into a monster suite" 
    suite = unittest.TestSuite()
    here = os.path.dirname(sys.argv[0]) or os.getcwd()
    alltests = collect_tests(here)
    suite.addTests(alltests)
    return suite

if __name__ == "__main__":
    tester = unittest.TextTestRunner()
    tester.run(suite())

Thre may be easier ways or better ways, but right now I don’t know them. I figure it will be good enough until I learn something cooler. I can think of several ways to extend it, but don’t need them.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 13 comments, permalink, rss, atom

Seed-And-Harvest Considered Harmful 13

Posted by tottinger Thu, 15 Feb 2007 20:28:00 GMT

Quite a while back I complained that a class is not a BASIC program, and more recently I blogged on my personal site about The Unspoken Parameter, but on further consideration I realize that these are both aspects of larger, well-known problem that vexes developers.

The real problem is the idea of seeding a function’s input in various places and then calling the program, and then harvesting the function’s results from the surrounding environment. This indirect access to input and output is troublesome and frustrates maintenance.

Seed-and-harvest programming was common in BASIC “gosub” programming, and I see it in some command line programs too. I see it in classes where methods take no parameters and have no return values and you must seed and harvest the object’s fields to make a successful test. I see it happening in poorly extracted test methods. I see it in java programs where you have to have a host of environment variables and files and command line parameters all set up to be harvested by the program.

Seed-and-harvest is daft. Every program is composed of input, process, output. We try to simplify and focus processing, but still I find code that does not take the same degree of care with input and output.

Since seed-and-harvest are bad, the degree to which the input and output are formal and direct is a goodness. It isn’t the only measure of goodness, but it is a goodness.

If I can call a function, passing everything it needs to know, and it returns a value indicating everything it has done for me, then I have an optimally direct and formal interface. Barring foolish and meaningless naming (or personal inability), I should be able to understand what it does fairly easily and without error. It is also far more likely to work in multithreaded environments, by the way.

If I must set environment variables or manipulate global data stores or even local data structures in order to call a function, then I have diluted the interface by relying on input seeding.

After calling a function, if I must to collect its output from the environment, global data structures, or local data structures, or various fields of the function’s object/class/module then the interface is diluted by harvesting.

An example of good here is a sensor object that emits a reading. The method to get a reading is obvious, clean, and likely to be thread-safe. A less good solution would leave the reading data in data fields of the sensor to be collected by a series of accessor calls. It defocuses the class re: SRP, while creating risk of update anomalies. It requires more effort from its clients than if it returned a reading.

This principle leads to other well-known principles. Dependency injection is a solution to a problem of indirectness. The Law of Demeter makes sense in the sense that it reduces the degree a method must reach to collect its input and to place its output.

In OO programs, static functions often have the most direct and formal interfaces since they can’t rely on instance variables. We know that static functions have problems with regard to substitutability in TDD, so we may make them non-static functions by moving them to a class or passing them by reference (closures and blocks). But we can see how much easier it is to read static functions than members, since we seldom have to read their code to understand what they are doing (though we may open them up to see how).

Objects with fields will exhibit some seed-and-harvest behavior to manage state and change of state, but we can preserve an otherwise direct formal interface with reference to callers of the object’s methods. We should not have to call a function to cause an effect and then call a number of accessors to get the results, as this would indicate we’ve diluted the interface with harvesting. We need to understand and bear in mind that we’re trading off some directness for convenience and encapsulation. If we know that we’re making that tradeoff then we are better equipped to do it well.

In end-user experience, this becomes more important. Plenty has been said by others about “direct manipulation”. They were wiser and earlier on the scene than myself; when it comes to UI I am still but an egg.

This concept is nothing new, even to me. I’m not saying it because I think it is something new. I’m saying it because I think people need to keep hearing it. It is simple but also profound.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 13 comments, permalink, rss, atom

Tuple Madness 33

Posted by tottinger Tue, 13 Feb 2007 18:07:16 GMT

If there is a downside to modern OO languages like Python and Ruby it is the great built-in list types. They’re wonderful, but they’re tempting.

Today I had my fourth or fifth encounter with tuple madness. It seems that the siren song of “free” data structures with unnamed members is just too much to resist. I’ve decided to call it Tuple Madness. An example is: x = (2,3,-1) Could one be less descriptive? Probably not.

I came across some code which uses two-tuples where both values are encoded integer values. It’s a pain to figure out.

In a previous job, a coworker would write his python entirely as filters on n-tuples so that figuring out what his code was doing was really painful. A function would take a sequence of n-tuples, and return a sequence of n-tuples where some of the values were preserved (though reordered) from the input, and the rest were transformations on the input + some local data. It was really tough to read a program where the data structures had unnamed elements and only existed between the return of one function and the invocation of the next.

At least one of my colleagues has blogged about suffering from Tuple Madness as well.

I don’t think that “magic tuples” are always evil, only that they’re no substitute for proper objects and meaningful data structures.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 33 comments, permalink, rss, atom

PySweeper: Digging In 12

Posted by tottinger Sun, 11 Feb 2007 03:44:00 GMT

In the last installment I got the source and started up the development environment.Now I start to play with the tools.

I start to peruse the source. Python programs are written a mix of paradigms, and this is something I love about it. Every variable is a first-class object, but not all methods have to be members of a class. You can have global functions (well, module-scoped) which are themselves first-class objects capable of having multiple methods and member variables, all participating in reflection/introspection. As such, some will describe the use of non-class-bound methods as “procedural”, and I suppose it’s technically a fair cop. Some of the program is written in class-free functions, mostly the bits that are used to initialize the application.

Python also has a number of functional programming features, but I suspect that the 2002 copyright predates list comprehensions and generators, so I don’t see them represented here. I will be using some of those techniques to simplify code that had to be written in loops and nested loops before. It should shorten a number of methods.

The PyGame UI and business logic (guts) of the code are separate modules with a protocol for communicating between the two. I was afraid that I would find them all intermingled, and was delighted to see separate modules. The core of the game itself is only a few hundred lines of code if I include comments, docstrings, and whitespace.

I know that the first order of business is to get some testing started, but I want so badly to to take Eclipse PyDev for a spin that I’m having an impulse control problem. I think the code is formatted reasonably, but I use the Eclipse feature to reformat the code to see if it works. I’m rewarded with equally attractive code. No damage is done!

The outlining feature works fine, and so does completion. I scribble a little and delete it. It’s looking good so far. I notice that the Run As.. menu has options to run Python, Jython, PyUnit, Python coverage and more. It’s looking good.

What of refactoring? Sure enough I find that I can rename, extract and inline variables, and extract and inline functions. Those are the biggies. I scroll into the constructor of the Field class. The Field represents the playing field of the game, the grid of bombs and clues. In the constructor are various paragraphs of code that help set up the class. I pick the section that adds mines, move some of the constants out of the way by extracting local variables, and then extract that portion of the function as _place_mines. In Python no functions are private, so the standard protocol is to use an underscore prefix as a ‘wart’ to tell other authors that it’s not really part of the formal interface, just a convenience function for the class. Love it or hate it, that is just how it’s done.

I do the same with the paragraph that validates the parameters (ie there are more than zero rows, columns, and mines, and that there are not more mines than cells in the grid). This becomes _validate_parameters, and is called from the constructor (in Pythonese the constructor is a function named __init__).

I find eight instance variables in __init__. That’s quite a few. I pledge to look for violations of the Single Responsibility Principle later.

It dawns on me that Eclipse doesn’t know my current version control selection, <a href= “http://en.wikipedia.org/wiki/Bazaar_(vcs)”>bazaar. This distresses me, because all the Team features of Eclipse are lost to me. I won’t be able to do all my checkin/update/diff functions from within the IDE. I vow to look for a plugin soonest. That’s two promises, two items of debt so far. I will keep going today.

For now, I see a function called _get_adjacent. It has a pretty simple function, but it looks like this:


    def _get_adjacent(self, x, y):
        """Provide a list of all tiles adjacent to the given tile.

        This function takes the x and y coordinates of a tile, and returns a
        list of a 2-tuples containing the coordinates of all adjacent tiles.

        x and y are the x and y coordinates of the base tile, respectively.
        """ 
        adjlist = [(x - 1, y - 1), (x, y - 1), (x + 1, y - 1),
                   (x - 1, y), (x + 1, y),
                   (x - 1, y + 1), (x, y + 1), (x + 1, y + 1)]
        rmlist = []
        for adjpair in adjlist:
            for value, index in [(-1, 0), (-1, 1), (self.cols, 0),
                                 (self.rows, 1)]:
                if adjpair[index] == value:
                    rmlist.append(adjpair)
                    break
        for item in rmlist:
            adjlist.remove(item)
        return adjlist

It looked pretty heavy to me, for the small amount of work it’s doing. Ah, it had to use nested loops instead of list comprehensions, and it’s adding unnecessary rows only to remove them later. I think I could do better if I had a body of tests to protect me.

I figure it’s time to try PyUnit inside Eclipse. I create a new file called testPysweeper.py. I hate the name, and again go against my better instincts by leaving it that for a while. That would be yet another promise, and I’m running out of patience for procrastination. I rename it testAdjacency.py. I feel an instant sense of relief and go on to write a series of tests of the existing _get_adjacent method. It is

        class PySweeperTest(unittest.TestCase):
            def parentSetup(self, columns, rows, mines, target):
                target_column, target_row = target
                self.game = pysweeper.Field(columns,rows,mines)
                self.target = (target_column,target_row)
                self.result = self.game._get_adjacent_cells(*self.target)
                self.result.sort()

        class CellsAdjacentToCol2Row2(PySweeperTest):
            """Expecting a circle of squares around the target cell""" 
            def setUp(self):
                self.parentSetup(5,5,0, (2,2))
            def testShouldFindEightCells(self):
                assert len(self.result) == 8
            def testShouldNotIncludeCol2Row2(self):
                assert self.target not in self.result

These are not highly imaginative, and continue with several different scenarios. I test a normal “middle” cell, one against an edge, the top left and bottom right corners, etc.

You’ll notice the base class PySweeperTest. I didn’t have that to start with but added it to reduce duplication, and I named it badly. These are not general PySweeper tests, but they are specific cell adjacency tests. Naming is easily fixed.

A cool thing happens. I rename it AdjacencyTest and Suddenly all the test names become redundant and silly. I start renaming them, and

  class CellsAdjacentToCol0Row0(PysweeperTest):

becomes

  class Column0Row0(AdjacencyTest):

On further reflection, this becomes:

  class UpperLeftCorner(AdjacencyTest):

It’s amazing how wrong my first instinct can be. The test is so much more obvious now, and the conditions make so much more sense. I smile in mild self-derision, but I’m happy to have cleaned it up quickly. The PyDev refactoring tools feel pretty comfortable to me now. One goal of the exercise is being met.

The tests run for the old routine, and I write a new routine.

I hit my next snag trying to run the python unit test. I have always done python from the command line, and would run my tests as if they were any other python program. I never did them as:

        python -m unittest filename.testname

I think that this must be what Eclipse wants to do. I can use the option to Run as.../Python run but I cannot use the option to Run as.../Python unit-test. When I try I get the following error:

  Finding files... ['/home/tottinge/Projects/Pysweeper-1.0/testAdjacency.py']
  done.
  Importing test modules... done.

  ----------------------------------------------------------------------
  Ran 0 tests in 0.000s

It found the right test file, but either it didn’t import it, or else it imported it and somehow didn’t recognize any tests. I added a print line to the file so I could see it, and got nothing. Why would it find my file and not import it? This is something that requires a little google..

It doesn’t work when I use the python -m unittest blah either. Maybe the problem isn’t that I’m not used to PyDev, but that I don’t know enough about PyUnit? I put this off, since I have a reasonable workaround for now. This is another bit of debt to work off. I’m unhappy at how fast it piles up.

The old version of _get_adjacent was about 21 lines with loops and local variables. Here is a smaller version using list comprehensions:

   def _get_adjacent(self, x, y):
        """Provide a list of all tiles adjacent to the given tile.""" 
        columns = (x-1,x,x+1);
        rows = (y-1,y,y+1);
        adjacent_pairs = [ (col,row)  for col in columns 
                                      for row in rows 
                                      if 0<=row<self.rows and 0<=col<self.cols]
        adjacent_pairs.remove( (x,y) )
        return adjacent_pairs

A list comp is very simple. The one above returns a list of (row,column) pairs by iterating over columns and rows, and only returns a pair if the row and column are both on the grid. But it doesn’t look that simple above. My spider sense buzzes. I immediately worry that the list comp is too complex. The filtering is too deep in the list comprehension and is too far from the initialization of rows and columns. A quick modification and retest.

    def _get_adjacent_cells( self, x, y ):
        """Provide a list of all tiles adjacent to the given tile.""" 
        columns = [ col for col in ( x-1, x, x+1) if 0<=col<self.cols ]
        rows = [ row for row in (y-1, y, y+1) if 0<=row<self.rows ]

        adjacent_pairs = [ ( col, row )  for col in columns for row in rows ]
        adjacent_pairs.remove( ( x, y ) )

        return adjacent_pairs

The tests all pass, and the code is more obvious to me. I have no teammate to correct me, so I leave it as-is.

One more feature to look at today, and that’s the compare with local history. It’s a beautiful thing. Eclipse is becoming a very good friend to me.

Posted in Tim's Tepid Torrent
Meta no trackbacks, 12 comments, permalink, rss, atom

Mentor	twitter id
Uncle Bob	unclebobmartin
Brett Schuchert	schuchert
Michael Feathers	mfeathers
Bob Koss	bob_koss

Zero 15

One 17

The Three Numbers 13

It's All Iterative 8

Outliving The Great Variable Shortage 34

PySweeper: Un-refactoring, Tuple Madness, and Injection 10

PySweeper: Python test collector 13

Seed-And-Harvest Considered Harmful 13

Tuple Madness 33

PySweeper: Digging In 12

Blog Search

Follow us on twitter

Categories

Blogroll

Syndicate