Parts 5 and 6 of the 4-part series 49

Posted by Brett Schuchert Sat, 17 Apr 2010 01:57:00 GMT

The title says it all. I was bothered by a few things in the Shunting Yard Algorithm (actually many more things), but I felt compelled to fix two of those things.

So if you have a look at the album, you’ll notice 2 more videos:
  • Video 5 of 4: Remove the need for spaces between tokens.
  • Video 6 of 4: Remove duplication of operators in algorithm and tokenizer.

Hope these are interesting or at least entertaining.

C# TDD Videos - You asked for them 161

Posted by Brett Schuchert Thu, 15 Apr 2010 04:29:00 GMT

Several people asked for them, so here is a series of 4 videos. The first series on the RPN calculator in Java was a bit rough, these are even rougher.

Even so, hope you find them valuable.

Shunting Yard Algorithm in C# Video Album

Comments and feedback welcome.

Notes from the OkC Dojo 2009-09-30 65

Posted by Brett Schuchert Thu, 01 Oct 2009 03:57:00 GMT

Tonight we had a small group of die-hard practitioners working with Ruby and RSpec. We intended to use the Randori style, but it was a small enough group that we were a bit more informal than that.

We tried the Shunting Yard Algorithm again and it worked out fairly well. The level of experience in Ruby was low to moderate (which is why we wanted to get people a chance to practice it) and the RSpec experience was generally low (again, great reason to give it a try).

We had several interesting (at least to me) side discussions on things such as:
  • Forth
  • Operator precedence
  • Operator associativity
  • L-Values and R-Values
  • Directed Acyclic Graphis
  • In-fix, pre-fix, post-fix binary tree traversal
  • Abstract Syntax Trees (AST)
  • The list goes on, I’m a big-time extrovert, so I told Chad to occasionally tell me to shut the heck up
The Shunting Yard Algorithm is a means of translating an in-fix expression into a post-fix expression (a.k.a. reverse polish notation – used by the best calculators in the world, HP [I also prefer vi, fyi!-] ). For example:
  • 1 + 3 becomes 1 3 +
  • a = b = 17 becomes a b 17 = =
  • 2 + 3 * 5 becomes 2 3 5 * +
  • 2 * 3 + 5 becomes 2 3 * 5 +

One typical approach to this problem is to develop an AST from the in-fix representation and then recursively traversing the AST using a recursive post-fix traversal.

What I like about he Shunting Yard Algorithm is it takes a traditionally recursive algorithm (DAG traversal, where a binary tree is a degenerate DAG) and writes it iteratively using it’s own stack (local or instance variable) storage versus using the program stack to store activation records (OK stack frames). Essentially, the local stack is used for pending work.

This is one of those things I think is a useful skill to learn: writing traditionally recursive algorithms using a stack-based approach. This allows you to step through something (think iteration) versus having to do things whole-hog (recursively, with a block (lambda) passed in). In fact, I bought a used algorithm book 20 years ago because it had a second on this subject. And looking over my left shoulder, I just saw that book. Nice.

To illustrate, here’s the AST for the first example:

Since the group had not done a lot with recursive algorithms (at least not recently), we discussed a short hand way to remember the various traversal algorithms using three letters: L, R, P

  • L -> Go Left
  • R -> Go Right
  • P -> Print (or process)
Each of the three traditional traversal algorithms (for a binary tree) use just these three letters. And the way to remember each is to put the ‘p’ where the name suggests. For example:
  • in-fix, in -> in between -> L P R
  • pre-fix, pre, before -> P L R
  • post-fix, post, after -> L R P
Then, given the graph above, you can traverse it as follows:
  • in-fix: Go left, you hit the 1, it’s a leaf note so print it, go up to the +, print it, go to the right, you end up with 1 + 3
  • post-fix: Go left you hit the 1, it’s a leaf node, print it, go back to the +, since this is post-fix, don’t print yet, go to the right, you get the 3, it’s a leaf node, print it, then finally print the +, giving: 1 3 +
  • pre-fix: start at + and print it, then go left, it’s a leaf note, print it, go right, it’s a leaf node, print it, so you get: + 1 3 – which looks like a function call (think operator+(1, 3))

It’s not quite this simple – we actually looked at larger examples – but this gets the essence across. And to move from a tree to a DAG, simply iterate over all children, printing before or after the complete iteration; in-fix doesn’t make as much sense in a general DAG. We also discussed tracking the visited nodes if you’ve got a graph versus an acyclic graph.

After we got a multi-operator expression with same-precedence operators working, e.g., 1 + 3 – 2, which results in: 1 3 + 2 -, we moved on to handling different operator precedence.

Around this time, there was some skepticism that post-fix could represent the same expression as in-fix. This is normal, if you have not seen these kinds of representations. And let’s be frank, how often do most of us deal with these kinds of things? Not often.

Also, there was another question: WHY?

In a nutshell, with a post-fix notation, you do not need parentheses. As soon as an operator is encountered, you can immediately process it rather than waiting until the next token to complete the operator (no look-ahead required). This also led to HP developing a calculator in 1967 (or ‘68) that was < 50 pounds and around USD $5,000 that could add, subtract, multiply and divide, which was huge at the time (with a stack size of 3 – later models went to a stack size of 4, giving us the x, y, z and t registers).

During this rat-hole, we discussed associativity. For example, a = b = c is really (a = (b = c))

That’s because the assignment operator is right-associative. This lead into r-values and l-values.

Anyway, we’re going to meet again next week. Because we (read this as me) were not disciplined in following the Randori style, these side discussions lead to taking a long to fix a problem. We should have “hit the reset button” sooner, so next time around we’re going to add a bit more structure to see what happens:
  • The driver finishes by writing a new failing test.
  • The driver commits the code with the newest failing test (we’ll be using git)
  • Change drivers and give him/her some time-box (5 – 10 minutes)
  • If, at the end of the current time-box, the current driver has all tests passing, go back to the first bullet in this list.
  • If at the end, the same test that was failing is still failing, (fist time only) give them a bit more time.
  • However, if any other tests are failing, then we revert back to the last check in and switch drivers.

Here’s an approximation of these rules using yuml.me:

And here’s the syntax to create that diagram:
(start)->(Create New Failing Test)->(Commit Work)->(Change Drivers)
(Change Drivers)->(Driver Working)
(Driver Working)-><d1>[tick]->(Driver Working)
<d1>[alarm]->(Check Results)->([All Tests Passing])->(Create New Failing Test)
(Check Results)->([Driver Broke Stuff])->(git -reset hard)->(Change Drivers)
(Check Results)->([First Time Only and Still Only Newest Test Failing])->(Give Driver A Touch More Time)->(Check Results)

Note that this is not a strict activity diagram, the feature is still in beta, and creating this diagram as I did made the results a bit more readable. Even so, I like this tool so I wanted to throw another example in there (and try out this diagram type I have not used before – at least not with this tool, I’ve created too many activity diagrams). If you’d like to see an accurate activity diagram, post a comment and I’ll draw one in Visio and post it.

Anyway, we’re going to try to move to a weekly informal practice session with either bi-weekly or monthly “formal” meetings. We’ll keep switching out the language and the tools. I’m even tempted to do design sessions – NO CODING?! What?! Why not. Some people still work that way, so it’s good to be able to work in different modes.

If you’re in Oklahoma City, hope to see you. If not, and I’m in your town, I’d be interested in dropping into your dojos!

Revisit: The common subgroups 23

Posted by tottinger Tue, 03 Jul 2007 15:43:00 GMT

In cleaning up the code, I simplified the algorithm a very little and improved performance considerably. Amazing how that works, how simpler equals faster for so much code. Adding simple data structures, local explanatory functions, and the like often make code much faster.

What I’m hoping is that I will use this in a few different and useful ways.

The first way is to look for interfaces where concrete classes are being used from many other classes. You need to add an interface, but don’t know who needs which part. The goal is to figure out a relatively small number of interfaces that satisfy a number of clients in a module.

The second use would be to look for common clumps of parameters when I’m working in a large code base where the average number of arguments per function call does not remotely approach one. I suspect that there are clumps of similarly-named variables being passed around, and that these are likely “missed classes”. Sometimes these are obvious, but it would be good to see them in a nice list spit out from a nice tool.

So this is a hopeful start on a series of useful tools.

Code follows.

import shelve
import sys

def find_groups(input):
    """ 
    Exhaustively searches for grouping of items in a map, such
    that an input map like this:
        "first":[1, 2, 3, 4],
        "second":[1,2,3,5,6],
        "third":[1,2,5,6]
    will result in:
        [1,2,3]: ["first","second"]
        [1,2]: ["first","second","third"]
        [5,6]: ["second","third"]

    Note that the return value dict is a mapping of frozensets to sets,
    not lists to lists as given above. Also, being a dict, the results
    are effectively unordered.
    """ 
    def tupleize(data):
        "Convert a set or frozenset or list to a tuple with predictable order" 
        return tuple(sorted(set(data)))

    def append_values(map, key, *values):
        key=tupleize(key)
        old_value = map.get(key,[])
        new_value = list(old_value) + list(values)
        new_value = tupleize(new_value)
        map[key] = new_value
        return key, new_value

    result = {}
    previously_seen = {}
    for input_identity, signatures in input.iteritems():
        input_signatures = set(signatures)
        for signature_seen, identities_seen in previously_seen.iteritems():
            common_signatures = set(signature_seen).intersection(input_signatures)
            if len(common_signatures) > 1:
                known_users = list(identities_seen) + [input_identity]
                append_values(result, common_signatures, *known_users)
        append_values(previously_seen, signatures, input_identity)
    return filter(result)

def filter(subsets):
    filtered = {}
    for key,value in subsets.iteritems():
        if (len(key) > 1) and (len(value) > 1):
            filtered[key] = set(value)
    return filtered

def display_groupings(groupings):
    "Silly helper function to print groupings" 
    keys = sorted(groupings.keys(), cmp=lambda x,y: cmp(len(x),len(y)))
    for key in keys:
        print "\n","-"*40
        for item in key:
            print item
        for item in sorted(groupings[key]):
            print "     ",item
        print