<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Object Mentor Blog: Python Subgroup Detection and Optimization</title>
    <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description></description>
    <item>
      <title>Python Subgroup Detection and Optimization</title>
      <description>&lt;p&gt;I had a moderately interesting customer problem to work on.  I got acquainted with a bit of legacy code that is seriously in need of some interface segregation.  It&amp;#8217;s an entirely concrete class and used from all over the code base. The question is how to segregate, and that depends on what methods are called from which programs. We ran &amp;#8216;nm&amp;#8217; to extract the link table from our object files, saving me the trouble of parsing C++ (a scary thought) All that remained was for me to compare the method prototypes used by the object files and find the common sets.&lt;/p&gt;


	&lt;p&gt;Lacking better ideas, I decided to do this exhaustively in a brute-force kind of way.  It is only a few hundred files, so it shouldn&amp;#8217;t take too long. I was very wrong. It took a long time and eventually failed.&lt;/p&gt;


	&lt;p&gt;I had &lt;span class="caps"&gt;TDD&lt;/span&gt;-ed the code, so I had tests of correctness, and I relied on these as I added optimizations, but of course the performance problem occurred only under a real load.&lt;/p&gt;


	&lt;p&gt;I could have run a profiler on it (and probably should have) but instead I simply monitored my computer. I quickly saw that my time was going into memory allocation, which is also the reason it died after many minutes When I have python performance problems, this is usually the reason, and almost never &amp;#8220;interpreter drag&amp;#8221;.&lt;/p&gt;


	&lt;p&gt;My friend Norbert (a gentleman of many programming languages, including ruby) suggested that I wasn&amp;#8217;t interning my strings, and of course I was not.  I switched to interning strings, and noticed a little improvement which meant the program ran &lt;em&gt;longer&lt;/em&gt; before failing from memory problems. Well, the tests still passed, so I knew at least that the logic was still good even if the algorithm was primitive and Occam was spinning in his grave.&lt;/p&gt;


	&lt;p&gt;Next I realized that I am dealing with a lot of small groups of strings, and in a leap of optimism/faith/stupidity I decided to intern groups of strings. That helped very little, but it did help since my program was now running for the better part of an hour (!!!) and then failing. &amp;#8220;More of same&amp;#8221; without measurement is hardly a good recipe for optimization.&lt;/p&gt;


	&lt;p&gt;This is when I realized that I shouldn&amp;#8217;t be storing sets or frozensets, which are pretty heavyweight data structures. I had chosen them because I was really working with set intersections, but hadn&amp;#8217;t counted the in-memory storage cost. I converted the data structure used by the algorithm and added a local function to make tuples out of the sorted sets.&lt;/p&gt;


	&lt;p&gt;I was very glad to have my tests to catch me when I had some typing mistake or sloppy conversion.  My tests had to be edited, but the changes there were very lightweight, and caused me to abstract out some data comparisons that were (admittedly) repeated. It was all good.&lt;/p&gt;


	&lt;p&gt;When I ran the full program it completed so quickly that I was sure I&amp;#8217;d broken it.  It was running with sub-second time, including gathering data from various text files (nm output files).  I did a few spot-checks, and determined that it was indeed doing the right thing (as far as I know).&lt;/p&gt;


	&lt;p&gt;The data is moderately interesting, and I will be able to pick out some useful interfaces.  Better yet, I have a program that can pick out all the uses of my big, fat class and recommend interfaces to me.  This is all good.&lt;/p&gt;


Lessons learned:
&lt;ul&gt;
&lt;li&gt;The tests gave me peace of mind as I worked.  I would so hate to have done this &amp;#8220;naked&amp;#8221;.&lt;/li&gt;
&lt;li&gt;Python&amp;#8217;s speed is fine (even startling) if you aren&amp;#8217;t doing something wasteful and heavy-handed like storing hundreds or thousands of non-interned strings and heavyweight data structures&lt;/li&gt;
&lt;li&gt;I could have been done sooner if I&amp;#8217;d measured with hotshot instead of guessing. This is very clear to me, and I won&amp;#8217;t think I&amp;#8217;m too clever or my problem is too simple to do this ever again.&lt;/li&gt;
&lt;li&gt;Keep some ruby friends on hand. They come in handy.&lt;/li&gt;
&lt;li&gt;I didn&amp;#8217;t really need a cooler algorithm.  Brute force is sometimes enough.&lt;/li&gt;
&lt;/ul&gt;

	&lt;p&gt;I want to build a new wrapper for this code to compare parameter lists, to help find unrecognized classes in these same programs.  It shouldn&amp;#8217;t be too hard, but it will be a larger data set so I will probably need some more optimization or a cooler algorithm later. I think I see some waste in it now, but I guess I&amp;#8217;ll have to fix that after blogging.&lt;/p&gt;


	&lt;p&gt;Of course, I&amp;#8217;m not done learning.  I am sure there are a lot of ways to improve the core code. I also believe in other people, that I should make it available to criticism and suggestions.&lt;/p&gt;


	&lt;p&gt;Here it is:&lt;/p&gt;


&lt;h3&gt;Tests&lt;/h3&gt;
(Which, embarassingly, could use more refactoring)
&lt;pre&gt;
import unittest
import clumps

def keySet(someMap):
    return set(someMap.keys())

class ClumpFinding(unittest.TestCase):
    def testNoGroupsForSingleItem(self):
        input = { "OneGroup": [1,2] }
        actual = clumps.find_groups(input)
        self.assertEquals({}, actual)

    def testNoOverlapMeansNoGroups(self):
        input = {
            "first": [1,3],
            "second": [2,4]
        }
        actual = clumps.find_groups(input)
        self.assertEquals({}, actual)

    def testIgnoresSingleMatches(self):
        input = {
            "first": [1,3],
            "second": [1,4]
        }
        actual = clumps.find_groups(input)
        self.assertEquals({}, actual)

    def testTwoInterfaceMatches(self):because of
        group = (1,2)
        names = ("first","second")
        input = dict( [(name,group) for name in names])

        actual = clumps.find_groups(input)

        self.assertEquals(1, len(actual))
        [key] = actual.keys()
        self.contentMatch(group, key)
        self.contentMatch(input.keys(), actual[key])

    def contentMatch(self, left,right):
        left,right = map( frozenset, [left,right])
        self.assertEquals(left,right)

    def testFindsThreeGroupsMatchingExactly(self):
        group = [1,3,8]
        names = "one","two","three" 
        input = dict( [(name,group) for name in names ] )

        actual = clumps.find_groups(input)

        self.assertEquals(1, len(actual))
        [clump_found] = actual.keys()
        self.contentMatch(group, clump_found)
        self.contentMatch(names, actual[clump_found])

    def testFindsPartialMatchInThreeGroups(self):
        input = {
            "a":[1,2,3,4,5],
            "b":[1,4,5,6,8],
            "c":[0,1,4,5]
        }
        target_group = frozenset([1,4,5])
        names = input.keys()

        actual = clumps.find_groups(input)

        [key] = actual.keys()
        self.contentMatch(target_group, key)
        self.contentMatch(names, actual[key])

    def testFindsMultipleMatches(self):
        input = {
            "a":[1,2,3,4,5],
            "b":[1,4,5,6,8],
            "c":[0,1,4,5],
            "d":[1,2,3],
            "e":[1,3]
        }

        actual = clumps.find_groups(input)
        keys = actual.keys()

        self.assertEqual(3, len(actual))

        grouping = (1,2,3)
        referents = set(["a","d"])
        self.assert_(grouping in keys, "expect %s in %s" % (grouping,keys) )
        self.assertEqual(referents, actual[grouping])

        grouping = (1,4,5)
        referents = set(["a","b","c"])
        self.assert_(grouping in keys)
        self.assertEqual(referents, actual[grouping])

        grouping = (1,3)
        referents = set(["a","d","e"])
        self.assert_(grouping in keys)
        self.assertEqual(referents, actual[grouping])

if __name__ == "__main__":
    unittest.main()
&lt;/pre&gt;

&lt;h3&gt;The Code&lt;/h3&gt;

&lt;pre&gt;
import sys

def find_groups(named_groups):
    """ 
    Exhaustively searches for grouping of items in a map, such 
    that an input map like this:
          "first":[1, 2, 3, 4],
          "second":[1,2,3,5,6],
          "third":[1,2,5,6]
    will result in:
        [1,2,3]: ["first","second"]
        [1,2]: ["first","second","third"]
        [5,6]: ["second","third"]

    Note that the return value dict is a mapping of frozensets to sets,
    not lists to lists as given above. Also, being a dict, the results
    are effectively unordered.
    """ 
    def tupleize(data):
        "Convert a set or frozenset or list to a tuple with predictable order" 
        return tuple(sorted(list(data)))

    result = {}
    for name, methods_called in named_groups.iteritems():
        methods_group = frozenset(methods_called)
        methods_tuple = tupleize(methods_group)
        for stored_interface in result.keys():
            key_set = frozenset(stored_interface)
            common_methods = tupleize(key_set.intersection(methods_group))
            if common_methods:
                entry_as_list = list(result.get(common_methods,[]))
                entry_as_list.append(name)
                entry_as_list.extend( result[stored_interface] )
                result[common_methods] = tupleize(entry_as_list)

        full_interface_entry = result.setdefault(methods_tuple, [])
        if name not in full_interface_entry:
            full_interface_entry.append(name)
    return filter(result)

def filter(subsets):
    # Apology: I'm betting I can do this in a functional way.
    filtered = {}
    for key,value in subsets.iteritems():
        if (len(key) &amp;gt; 1) and (len(value) &amp;gt; 1):
            filtered[key] = set(value)
    return filtered

def display_groupings(groupings):
    "Silly helper function to print groupings" 
    keys = sorted(groupings.keys(), cmp=lambda x,y: cmp(len(x),len(y)))
    for key in keys:
        print "\n","-"*40
        for item in key:
            print item
        for item in sorted(groupings[key]):
            print "     ",item
        print

&lt;/pre&gt;</description>
      <pubDate>Wed, 27 Jun 2007 23:23:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:5615a4c9-4a82-447e-92e1-ff331d47a48f</guid>
      <author>tottinger</author>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization</link>
      <category>Tim's Tepid Torrent</category>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by ysbearing</title>
      <description>&lt;p&gt;Slewing bearing called slewing ring bearings, is a comprehensive load to bear a large bearing, can bear large axial, radial load and overturning moment.&lt;/p&gt;</description>
      <pubDate>Wed, 19 Oct 2011 04:23:55 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:c57502eb-d7aa-4d9a-b99c-d0684c5475a3</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-159646</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by ysbearing</title>
      <description>&lt;p&gt;Slewing bearing called slewing ring bearings, is a comprehensive load to bear a large bearing, can bear large axial, radial load and overturning moment.&lt;/p&gt;</description>
      <pubDate>Wed, 19 Oct 2011 04:22:08 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:ec07f6fe-05d8-45ab-9f30-daf1f66982f9</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-159642</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Buy Nike Air Yeezy</title>
      <description>&lt;p&gt;The worst method to forget some one is for getting sitting centerbesidehim knowing you cant have him.&lt;/p&gt;</description>
      <pubDate>Thu, 13 Oct 2011 22:36:33 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:dfec9267-7cf0-4961-82dc-e8c4858f3bb9</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-156241</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Air Jordan Max Fusion</title>
      <description>&lt;p&gt;A excellent dude is invariably ready to acquire little.&lt;/p&gt;</description>
      <pubDate>Thu, 13 Oct 2011 22:27:15 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:87f2bb22-1e7a-4a59-95a5-2ccf99a9c3ee</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-156172</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by chaussures</title>
      <description>&lt;p&gt;It is often detailed that should &#8220; 50 &#8216; soon after the&#8221; equipped coupled with encouraged together with the Fang-Fang Li, Liu Dong, Huang Ming brand-new legend handset&lt;a href="http://www.louboutinchaussures-fr.com/jimmy-choo-jimmy-choo-escarpins-sandales-39_37/" rel="nofollow"&gt;Jimmy Choo Escarpins sandales&lt;/a&gt;. Fang-Fang Li were able to move on as a result of Ny University College student Bank concerned with Training video Attempting Business while in the the front without the need of hand back is often Ang Protect; &lt;a href="http://www.louboutinchaussures-fr.com/jimmy-choo-jimmy-choo-tall-boots-39_34/" rel="nofollow"&gt;Jimmy Choo Tall Boots&lt;/a&gt;your ex-girlfriend compact popularity, &amp;#8216;07 ages youngster exactly who manufacture timeless tomes, &#8220; 17-year-old yowl, &#8221; &lt;a href="http://www.louboutinchaussures-fr.com/jimmy-choo-jimmy-choo-tall-boots-39_34/" rel="nofollow"&gt;Jimmy Choo Tall Boots&lt;/a&gt;coupled with made use of within 10 types considering the exact make concerned with its TV cord, prevalent watch, the following turned sailing the initial accolade reward along with the Enthusiastic Head limitation Reward&lt;p&gt;&lt;a href="http://www.north-face-jakke.com/" rel="nofollow"&gt;&lt;em&gt;&lt;strong&gt;the north face&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;.&lt;/p&gt;


	&lt;p&gt;At this juncture a person&amp;#8217;s preparation while in the program &#8220; 50 &#8216; after&#8221; the next concerned with 2007 have been remaining picked up all over Tokyo, &#8220; Asian Youngster &lt;p&gt;&lt;a href="http://www.north-face-jakke.com/" rel="nofollow"&gt;&lt;em&gt;&lt;strong&gt;north face jakke&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;program. &#8221; &#8220; 50 &#8216; soon after the&#8221; Liu Dong Chen XingchenDiscount Ugg Boots |Discount Ugg Boots gamed outside together with the business expansion considering 1980s track record while in the big vary arrangement China&#8217; ersus marvelous improvements all over couple of a long time, signifying business expansion concerned with 50 youngster right after track record utilizing their love, conduct coupled with likelihood so you might impressive symptoms.&lt;a href="http://www.canadagoosejakkedk.com" rel="nofollow"&gt;&lt;strong&gt;canada goose&lt;/strong&gt;&lt;/a&gt; This amazing news flash press reporter determined this, &#8220; 50 &#8216; after&#8221; is going to be launched all over Tiongkok all over past due August&lt;a href="http://www.canadagoosejakkedk.com" rel="nofollow"&gt;&lt;strong&gt;canada goose jakke&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;</description>
      <pubDate>Fri, 16 Sep 2011 22:27:42 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4defe653-9968-4914-b3e1-4e48998f81d2</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-141504</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by okey oyunu oyna </title>
      <description>&lt;p&gt;great code i will use it. Thanks&lt;/p&gt;


	&lt;p&gt;Ger&#231;ek kisilerle sohbet ederek  &lt;a href="http://www.okeyoyunu-oyna.com" rel="nofollow"&gt;Okey Oyunu Oyna&lt;/a&gt; ve internette online oyun oynamanin zevkini &#231;ikar.&lt;/p&gt;</description>
      <pubDate>Mon, 25 Apr 2011 16:37:58 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:8cf35894-37f1-451f-98b6-722135485507</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-90578</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Gaylene Mortell</title>
      <description>&lt;p&gt;Really nice post, I have been looking to read something of this nature and your writing has fit the bill perfectly, thanks very much indeed.&lt;/p&gt;</description>
      <pubDate>Tue, 19 Apr 2011 04:44:10 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:5787d7fb-5ea2-4763-96f4-000d79b597bf</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-87172</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Tenant Screening</title>
      <description>&lt;p&gt;I think I see some waste in it now, but I guess I&#8217;ll have to fix that after blogging.&lt;/p&gt;</description>
      <pubDate>Tue, 22 Feb 2011 10:53:21 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:a615991e-cd79-477d-bba2-0225553c13a4</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-65682</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Criminal Records</title>
      <description>&lt;p&gt;The data is moderately interesting, and I will be able to pick out some useful interfaces.&lt;/p&gt;</description>
      <pubDate>Fri, 18 Feb 2011 11:43:05 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:1103eb80-d69f-4bc5-8be0-eedca049ac2e</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-64237</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by Pandora</title>
      <description>&lt;p&gt;In short, the test suite makes it easy to make changes to my code. It makes my code flexible and easy to maintain.&lt;/p&gt;</description>
      <pubDate>Thu, 02 Dec 2010 05:01:46 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:6508e389-94a1-4716-8038-e816683dcf59</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-45403</link>
    </item>
    <item>
      <title>"Python Subgroup Detection and Optimization" by iPhone to Mac Transfer</title>
      <description>&lt;p&gt;The software you can trust to export iPhone music, video and more to Mac.&lt;/p&gt;</description>
      <pubDate>Mon, 22 Nov 2010 07:50:59 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:f66716bf-5c61-4b47-93cd-4e2c8a907f61</guid>
      <link>http://blog.objectmentor.com/articles/2007/06/27/python-subgroup-detection-and-optimization#comment-42290</link>
    </item>
  </channel>
</rss>

