"Big Balls of Mud" and Shanty Towns 2
Last Thursday, the last day of OOPSLA 2007, Brian Foote gave a restrospective of Big Ball of Mud, which he and Joseph Yoder presented at the Fourth Conference on Patterns Languages of Programs (PLoP ‘97/EuroPLoP ‘97) and which was published as a paper in 1999.
Foote and Yoder argue that the dominant architecture of deployed application is a Big Ball of Mud, which they define thusly:
A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.
I had to leave mid-way through the talk to catch a plane, but before I left, he said something that caught my attention. He compared such applications to shanty towns, those ad hoc communities that spring up with no planning, no infrastructure, and reflect the bare minimum of resources and expertise available to their builders and inhabitants.
However, as I looked at his picture of a typical shanty town, I noticed that there are paths through the maze of ad hoc homes. There is some structure there. Then it occurred to me that, for all their problems, there is one interesting difference between typical shanty towns and many software applications; shanty towns are subject to frequent “testing” and “refactoring”. Within the extreme limitations of their architectures and the available resources, the inhabitants do what they can to fix “bugs” and adapt to new “requirements”.
Of course, I’m not saying that shanty towns are good. I’m just pointing out that they have a feedback loop that leads to modest improvements. In contrast, although we application developers have more resources at our disposal, we don’t subject our applications to the same scrutiny.
Why is this? A big part of the problem is that we forget just how complex software really is. How many points of variation exist within a home and its community? How many points of variation exist within a software application? Perhaps more than one per line of (uncommented) code?
We interact with the points of variation in our homes on a regular basis, directly or indirectly, and we make adjustments as needed (unless we’re lazy ;). In contrast, most of the corresponding points in software applications are deeply hidden and not evident when we use the applications.
You know where this is going; automated testing is the only way to subject the points of variation in applications to the same level of scrutiny and to find what needs fixing.
Applications Should Use Several Languages 10
Yesterday, I blogged about TDD in C++ and ended with a suggestion for the dilemma of needing optimal performance some of the time and optimal productivity the rest of the time. I suggested that you should use more than one language for your applications.
If you are developing web applications, you are already doing this, of course. Your web tier probably uses several “languages”, e.g., HTML, JavaScript, JSP/ASP, CSS, Java, etc.
However, most people use only one language for the business/mid tier. I think you should consider using several; a high-productivity language environment for most of your work, with the occasional critical functionality implemented in C or C++ to optimize performance, but only after actually measuring where the bottlenecks are located.
This approach is much too rare, but it has historical precedents. One of the most successful and long-lived software projects of all time is Emacs. It consists of a core C-based runtime with most of the functionality implemented in Emacs lisp “components”. The relative ease of extending Emacs using lisp has resulted in a rich assortment of support tools for various operating systems, languages, build tools, etc. Even modern IDEs and and other graphical editors have not completely displaced Emacs.
Java has embraced the mixed language philosophy somewhat reluctantly. JNI is the official and most commonly-used API for invoking “native” code, but it is somewhat hard to use and few people actually use it. In contrast, for example, the Ruby world has always embraced this approach. Ruby has an easy to use API for invoking native C code and good alternatives exist for invoking code in other languages. As a result, many of the 3rd-party Ruby libraries (or gems) contain both Ruby and native C code. The latter is built on the fly when you install the gem. Hence, there are many high-performance Ruby applications. This is not a contradiction in terms, because the performance-critical sections run natively, even though interpreted Ruby is relatively slow.
Of course, you have to be judicious in how you use mixed-language programming. Crossing the language boundary is often somewhat heavyweight, so you should avoid doing such invocations inside tight loops, for example.
So, I think the solution to the dilemma of needing high performance sometimes and high productivity the rest of the time is to pick the right tools for each circumstance and make them interoperate. Even constrained embedded devices like cell phones would be easier to implement if most of the code were written in a language like Ruby, Python, Smalltalk, or Java and performance-critical components were written in C or C++.
If I were starting such a greenfield project, I would assume that time-to-money is the top priority and write most of my code in Ruby (my personal current favorite), using TDD of course. I would profile it constantly, as part of the nightly or continuous-integration build. When bottlenecks emerge, I would first determine if a refactoring is sufficient to fix them and if not, I would rewrite the critical sections in C. If the project were for an embedded device, I would also watch the resource usage carefully.
For my embedded device, I would test from the beginning whether or not the overhead of the interpreter/VM and the overall performance are acceptable. I would also be sure that I have adequate tool support for the inevitable remote debugging and diagnostics I’ll have to do. If I made the wrong tool choices after all, I would know early on, when it’s still relatively painless to retool.
If you’re an IT or web-site developer, you have fewer performance limitations and more options. You might decide to make the cross-language boundary a cross-process boundary, e.g., by communicating through some sort of lightweight web services. This is one way to leverage legacy C/C++ code while developing new functionality in a more productive language.
