Unit Testing C and C++ ... with Ruby and RSpec! 11

Posted by Dean Wampler Tue, 05 Feb 2008 04:08:00 GMT

If you’re writing C/C++ code, it’s natural to write your unit tests in the same language (or use C++ for your C test code). All the well-known unit testing tools take this approach.

I think we can agree that neither language offers the best developer productivity among all the language choices out there. Most of us use either language because of perceived performance requirements, institutional and industry tradition, etc.

There’s growing interest, however, in mixing languages, tools, and paradigms to get the best tool for a particular job. <shameless-plug>I’m giving a talk March 7th at SD West on this very topic, called Polyglot and Poly-Paradigm Programming </shameless-plug>.

So, why not use a more productive language for your C or C++ unit tests? You have more freedom in your development chores than what’s required for production. Why not use Ruby’s RSpec, a Behavior-Driven Development tool for acceptance and unit testing? Or, you could use Ruby’s version of JUnit, called Test::Unit. The hard part is integrating Ruby and C/C++. If you’ve been looking for an excuse to bring Ruby (or Tcl or Python or Java or…) into your C/C++ environment, starting with development tasks is usually the path of least resistance.

I did some experimenting over the last few days to integrate RSpec using SWIG (Simplified Wrapper and Interface Generator), a tool for bridging libraries written in C and C++ to other languages, like Ruby. The Ruby section of the SWIG manual was very helpful.

My Proof-of-Concept Code

Here is a zip file of my experiment: rspec_for_cpp.zip

This is far from a complete and working solution, but I think it shows promise. See the Current Limitations section below for details.

Unzip the file into a directory. I’ll assume you named it rspec_for_cpp. You need to have gmake, gcc, SWIG and Ruby installed, along with the RSpec “gem”. Right now, it only builds on OS X and Linux (at least the configurations on my machines running those OS’s – see the discussion below). To run the build, use the following commands:

    
        $ cd rspec_for_cpp/cpp
        $ make 
    

You should see it finish with the lines

    
        ( cd ../spec; spec *_spec.rb )
        .........

        Finished in 0.0***** seconds

        9 examples, 0 failures
    

Congratulations, you’ve just tested some C and C++ code with RSpec! (Or, if you didn’t succeed, see the notes in the Makefile and the discussion below.)

The Details

I’ll briefly walk you through the files in the zip and the key steps required to make it all work.

cexample.h

Here is a simple C header file.

    
        /* cexample.h */
        #ifndef CEXAMPLE_H
        #define CEXAMPLE_H
        #ifdef __cplusplus
         extern "C" {
        #endif
        char* returnString(char* input);
        double returnDouble(int input);
        void  doNothing();

        #ifdef __cplusplus
         }
        #endif
        #endif
    

Of course, in a pure C shop, you won’t need the #ifdef __cplusplus stuff. I found this was essential in my experiment when I mixed C and C++, as you might expect.

cpp/cexample.c

Here is the corresponding C source file.

    
        /* cexample.h */

        char* returnString(char* input) {
            return input;
        }

        double returnDouble(int input) {
            return (double) input;
        }

        void  doNothing() {}
    

cpp/CppExample.h

Here is a C++ header file.

    
        #ifndef CPPEXAMPLE_H
        #define CPPEXAMPLE_H

        #include <string>

        class CppExample 
        {
        public:
            CppExample();
            CppExample(const CppExample& foo);
            CppExample(const char* title, int flag);
            virtual ~CppExample();

            const char* title() const;
            void        title(const char* title);
            int         flag() const;
            void        flag(int value);

            static int countOfCppExamples();
        private:
            std::string _title;
            int         _flag;
        };

        #endif
    

cpp/CppExample.cpp

Here is the corresponding C++ source file.

    
        #include "CppExample.h" 

        CppExample::CppExample() : _title("") {}
        CppExample::CppExample(const CppExample& foo): _title(foo._title) {}
        CppExample::CppExample(const char* title, int flag) : _title(title), _flag(flag) {}
        CppExample::~CppExample() {}

        const char* CppExample::title() const { return _title.c_str(); }
        void        CppExample::title(const char* name) { _title = name; }

        int  CppExample::flag() const { return _flag; }
        void CppExample::flag(int value) { _flag = value; }

        int CppExample::countOfCppExamples() { return 1; }
    

cpp/example.i

Typically in SWIG, you specify a .i file to the swig command to define the module that wraps the classes and global functions, which classes and functions to expose to the target language (usually all in our case), and other assorted customization options, which are discussed in the SWIG manual. I’ll show the swig command in a minute. For now, note that I’m going to generate an example_wrap.cpp file that will function as the bridge between the languages.

Here’s my example.i, where I named the module example.

    
        %module example
        %{
            #include "cexample.h" 
            #include "CppExample.h"    
        %}
        %include "cexample.h" 
        %include "CppExample.h" 
    

It looks odd to have header files appear twice. The code inside the %{...%} (with a ’#’ before each include) are standard C and C++ statements, etc. that will be inserted verbatim into the generated “wrapper” file, example_wrap.cpp, so that file will compile when it references anything declared in the header files. The second case, with a ’%’ before the include statements1, tells SWIG to make all the declarations in those header files available to the target language. (You can be more selective, if you prefer…)

Following Ruby conventions, the Ruby plugin for SWIG automatically names the module with an upper case first letter (Example), but you use require 'example' to include it, as we’ll see shortly.

Building

See the cpp/Makefile for the gory details. In a nutshell, you run the swig command like this.

    
        swig -c++ -ruby -Wall -o example_wrap.cpp example.i
    

Next, you create a dynamically-linked library, as appropriate for your platform, so the Ruby interpreter can load the module dynamically when required. The Makefile can do this for Linux and OS X platforms. See the Ruby section of the SWIG manual for Windows specifics.

If you test-drive your code, which tends to drive you towards minimally-coupled “modules”, then you can keep your libraries and build times small, which will make the build and test cycle very fast!

spec/cexample_spec.rb and spec/cppexample_spec.rb

Finally, here are the RSpec files that exercise the C and C++ code. (Disclaimer: these aren’t the best spec files I’ve ever written. For one thing, they don’t exercise all the CppExample methods! So sue me… :)

    
        require File.dirname(__FILE__) + '/spec_helper'
        require 'example'

        describe "Example (C functions)" do
          it "should be a constant on Module" do
            Module.constants.should include('Example')
          end
          it "should have the methods defined in the C header file" do
            Example.methods.should include('returnString')
            Example.methods.should include('returnDouble')
            Example.methods.should include('doNothing')
          end
        end

        describe Example, ".returnString" do
          it "should return the input char * string as a Ruby string unchanged" do
            Example.returnString("bar!").should == "bar!" 
          end  
        end

        describe Example, ".returnDouble" do
          it "should return the input integer as a double" do
            Example.returnDouble(10).should == 10.0
          end
        end

        describe Example, ".doNothing" do
          it "should exist, but do nothing" do
            lambda { Example.doNothing }.should_not raise_error
          end
        end
    

and

    
    require File.dirname(__FILE__) + '/spec_helper'
    require 'example'

    describe Example::CppExample do
      it "should be a constant on module Example" do
        Example.constants.should include('CppExample')
      end
    end

    describe Example::CppExample, ".new" do
      it "should create a new object of type CppExample" do
        example = Example::CppExample.new("example1", 1)
        example.title.should == "example1" 
        example.flag.should  == 1
      end
    end

    describe Example::CppExample, "#title(value)" do
      it "should set the title" do
        example = Example::CppExample.new("example1", 1)
        example.title("title2")
        example.title.should == "title2" 
      end
    end

    describe Example::CppExample, "#flag(value)" do
      it "should set the flag" do
        example = Example::CppExample.new("example1", 1)
        example.flag(2)
        example.flag.should == 2
      end
    end
    

If you love RSpec like I do, this is a very compelling thing to see!

And now for the small print:

Current Limitations

As I said, this is just an experiment at this point. Volunteers to make this battle-ready would be most welcome!

General

The Example Makefile File

It Must Be Hand Edited for Each New or Renamed Source File

You’ve probably already solved this problem for your own make files. Just merge in the example Makefile to pick up the SWIG- and RSpec-related targets and rules.

It Only Knows How to Build Shared Libraries for Mac OS X and Linux (and Not Very Well)

Some definitions are probably unique to my OS X and Linux machines. Windows is not supported at all. However, this is also easy rectify. Start with the notes in the Makefile itself.

The module.i File Must Be Hand Edited for Each File Change

Since the format is simple, a make task could fill a template file with the changed list of source files during the build.

Better Automation

It should be straightforward to provide scripts, IDE/Editor shortcuts, etc. that automate some of the tasks of adding new methods and classes to your C and C++ code when you introduce them first in your “spec” files. (The true TDD way, of course.)

Specific Issues for C Code Testing

I don’t know of any other C-specific issues, so unit testing with Ruby is most viable today for C code. Although I haven’t experimented extensively, C functions and variables are easily mapped by SWIG to a Ruby module. The Ruby section of the SWIG manual discusses this mapping in some detail.

Specific Issues for C++ Code Testing

More work will be required to make this viable. It’s important to note that SWIG cannot handle all C++ constructs (although there are workarounds for most issues, if you’re committed to this approach…). For example, namespaces, nested classes, some template and some method overloading scenarios are not supported. The SWIG manual has details.

Also, during my experiment, SWIG didn’t seem to map const std::string& objects in C++ method signatures to Ruby strings, as I would have expected (char* worked fine).

Is It a Viable Approach?

Once the General issues listed above are handled, I think this approach would work very well for C code. For C++ code, there are more issues that need to be addressed, and programmers who are committed to this strategy will need to tolerate some issues (or just use C++-language tools for some scenarios).

Conclusions: Making It Development-Team Ready

I’d like to see this approach pushed to its logical limit. I think it has the potential to really improve the productivity of C and C++ developers and the quality of their test coverage, by leveraging the productivity and power of dynamically-typed languages like Ruby. If you prefer, you could use Tcl, Python, even Java instead.

License

This code is complete open and free to use. Of course, use it at your own risk; I offer it without warranty, etc., etc. When I polish it to the point of making it an “official” project, I will probably release under the Apache license.

1 I spent a lot of time debugging problems because I had a ’#’ where I should have had a ’%’! Caveat emptor!

ANN: Talks at the Chicago Ruby Users Group (Oct. 1, Dec. 3)

Posted by Dean Wampler Sat, 29 Sep 2007 14:52:00 GMT

I’m speaking on Aquarium this Monday night (Oct. 1st). Details are here. David Chelimsky will also be talking about new developments in RSpec and RBehave.

At the Dec. 3 Chirb meeting, the two of us are reprising our Agile 2007 tutorial Ruby’s Secret Sauce: Metaprogramming.

Please join us!

Strongly Typed Languages Considered Dangerous 23

Posted by Brett Schuchert Thu, 23 Aug 2007 23:37:00 GMT

Have you ever heard of covariance? Method parameters or returns are said to be covariant if, as you work your way down an inheritance hierarchy, the types of the formal parameters or return type in an overridden method can be sub-types of the formal parameters and return types in the superclass’ version of the method.

Oh, and contravariance is just the opposite.

What?! Why should you care? Answer, you shouldn’t. Yes C++ and Java both support covariant return types, but so what? Have you ever used them? OK, I have, but then I also used and liked C++ for about 7 years, over 10 years ago. We all learn to move on.

You ever notice how something meant to help often (typically?) turns out to do exactly the opposite? Even worse, it directly supports or enables another unfortunate behavior. This is just Weinberg’s first rule of problem solving:

Every solution introduces new problems

Here’s an example I’m guilty of. Several years ago I was working on a project where we had written many unit tests. We had not followed the FIRST principles, specifically R-Reliability. Our tests were hugely affected by the environment. We had copies of real databases that would get overridden without warning. We’d have problems with MQ timeouts at certain times of the day or sometimes for days. And our tests would fail.

We wold generally have “events” that would cause tests to fail for some time. We were using continuous integration and so to “fix” the problem, I created a class we called “TimeBomb” – turns out it was the right name for the wrong reason.

You’d put a TimeBomb in a test and then comment out the test. The TimeBomb would be given a date. Until that date, the test would “pass” – or rater be ignored. At some point in the future, the TimeBomb would expire and tests would start failing again.

I was so proud of it, I even took the time to write something up about it here. I had nothing but the best intentions. I wanted an active mechanism to remind us of work that needed to be done. We had so many todo’s and warnings that anything short of an active reminder would simply be missed. I also wanted CI to “work.” But what eventually happened was that as our tests kept failing, we’d simply keep moving the TimeBomb date out.

I wrote something that enabled our project to collect more design debt. As I said, my intentions were noble. But the road to Hell is paved with good intentions. Luckily I got the name right. The thing that was really blowing up, however, was the project itself.

What has all of this got to do with “strongly typed languages”?

If you’re working with a strongly typed language, you will have discussions about covariance (well I do anyway). You’ll discuss the merits of multiple inheritance (it really is a necessary feature – let the flames rise, I’m already in Hell from my good intentions). You’ll also discuss templates/generics. The list goes on and on.

If you’re working with a dynamically typed language (the first one I used professionally was Smalltalk but I also used Self, Objective-C, which lets you choose, and several other languages whose names I do not recall).

Covariance is not even an issue. The same can be said of generics/templates and yes, even multiple inheritance is less of an issue. In a strongly-typed languages, things like Multiple Inheritance are necessary if you want your type system to be complete. (If you don’t believe me, read a copy of Object Oriented Software Construction, 2nd ed. by Bertrand Meyer – an excellent read.)

Ever created a mock object in a typed language? You either need an interface or at least a non-final class. What about Ruby or Smalltalk? Nope. Neither language cares about a class’ interface until it executes. And neither language cares how a class is able respond to a particular message, just that it does. It’s even possible in both languages to NOT have a method and still work if you mess with the meta-language protocol.

OK, but still, does this make typed languages bad?

Lets go back to that issue of enabling bad things.

Virtual Machines, like the JVM and CLR, have made amazing strides in the past few years. Reflection keeps getting faster. Just in time compilers keep getting smarter. The time required for intrinsic locks has improved and now a Java 5 compiler will support non-blocking, safe, multi-threaded updates. Modern processors support such operations using an optimistic approach. Heck, Java 6 even does some cool stuff with local object references to significantly improve Java’s memory usage. Java runs pretty fast.

Dynamic languages, generally, are not there yet. I hope they get there, but they simply are not there. But so what?! If your system runs “fast enough” then it’s fast enough. People used Smalltalk for real applications years ago – and still do to some extent. Ruby is certainly fast enough for a large class of problems.

How is that? These languages force you to write well. If you do not, then you will write code that is not “fast enough.” I’ve see very poor performing Smalltalk solutions. But it was never because of Smalltalk, it was because of poor programming practices. Are there things that won’t currently perform fast enough in dynamically typed languages? Yes. Are most applications like that? Probably not.

You can’t get away with as much in a dynamically typed language. That sounds ironic. On the one hand you have amazing power with dynamically typed languages. Of course Peter Parker learned that “With great power comes great responsibility.” This is just as true with Ruby and other dynamically typed languages.

You do not have as much to guide you in a dynamically typed language. Badly written, poorly organized code in a typed language is hard to follow but it’s possible. In a language like Ruby or Smalltalk it’s still possible but it’s a lot harder. Such poor approaches will typically fail sooner. And thats GOOD! You’ve wasted less money because you failed sooner. If you’ve got crappy design, you’re going to fail. The issue is how much time/money will you spend to get there.

Another thing that strongly typed languages offer is a false sense of security because of compiler errors. I have heard many people deride the need for unit tests because the compiler “will catch it.”

This misses a significant point that unit tests can serve as a specification of behavior rather than just a validation of as-written code.

You cannot get away with as much in a dynamically typed language. Or put another way, a dynamically typed language does not enable you to be as sloppy. It’s just a fact. In fact you can typically get away with a lot in a dynamically typed language, you just have to do it well.

Does this mean that dynamically typed languages are harder to work in? Maybe. But if you follow TDD, then the language is less important.

Do we need fast, typed languages? Clearly we do. There are applications where having such languages is necessary. Device drivers are not written in Ruby (maybe they are, I’m just trying to be more balanced).

However, how many of you were around when games were written in assembly? C was not fast enough. Then C was fast enough but C++ was still a question mark. C++ and C became mainstream but Java was just too slow. Some games are getting written in Java now. Not many, but it’s certainly possible. It is also possible to use multiple languages and put the time-critical stuff, which is generally a small part of your system, in one language and use another language for the rest of the system.

Back in the very early 90’s I worked just a little bit with Self. At that time, their Just In Time compiler would create machine code from byte code for methods that got executed. They went one step further, however. Not only did they JIT compile a method for an object, they would actually do it for combinations of receivers and type parameters.

There’s a name for this. In general it is called multi-dispatch (some languages support double-dispatch, the visitor pattern is a mechanism to turn standard polymorphism into a weak form of double dispatch, Smalltalk used a similar approach for handling numerical calculations).

Self was doing this not to support multi-dispatch but to improve performance. That means that a given method could have multiple compiled forms to handle commonly used methods on a receiver with commonly-used parameters. Yes it used more memory. But given what modern compilers do with code optimization, it just seems that this kind of technique could have huge benefits in the performance of dynamically typed languages. This is just one way a dynamic language can speed things up. There are others and they are happening NOW.

I’m hoping that in the next few years dynamic languages will get more credit for what they have to offer. I believe they are the future. I still primarily develop in Java but that’s just because I’m waiting for the dust to settle a little bit and for a clear dynamic language to start to assert itself. I like Ruby (though its support for block closures is, IMO, weak). I’m not convinced Ruby is the next big thing. I’m working with it a little bit just in case, however.

What are you currently doing that enables yourself or your co-workers to maintain the status quo?

Announcement: Aquarium v0.1.0 - An Aspect-Oriented Programming Toolkit for Ruby 1

Posted by Dean Wampler Thu, 23 Aug 2007 22:26:00 GMT

I just released the first version of a new Aspect-Oriented Programming toolkit for Ruby called Aquarium. I blogged about the goals of the project here. Briefly, they are
  • Create a powerful pointcut language, the most important part of an AOP toolkit.
  • Provide Robust support for concurrent advice at the same join point..
  • Provide runtime addition and removal of aspects.
  • Provide a test bed for implementation ideas for DSL's.
There is extensive documentation at the Aquarium site. Please give it a try and let me know what you think!

Applications Should Use Several Languages 10

Posted by Dean Wampler Wed, 04 Jul 2007 16:38:31 GMT

Yesterday, I blogged about TDD in C++ and ended with a suggestion for the dilemma of needing optimal performance some of the time and optimal productivity the rest of the time. I suggested that you should use more than one language for your applications.

If you are developing web applications, you are already doing this, of course. Your web tier probably uses several “languages”, e.g., HTML, JavaScript, JSP/ASP, CSS, Java, etc.

However, most people use only one language for the business/mid tier. I think you should consider using several; a high-productivity language environment for most of your work, with the occasional critical functionality implemented in C or C++ to optimize performance, but only after actually measuring where the bottlenecks are located.

This approach is much too rare, but it has historical precedents. One of the most successful and long-lived software projects of all time is Emacs. It consists of a core C-based runtime with most of the functionality implemented in Emacs lisp “components”. The relative ease of extending Emacs using lisp has resulted in a rich assortment of support tools for various operating systems, languages, build tools, etc. Even modern IDEs and and other graphical editors have not completely displaced Emacs.

Java has embraced the mixed language philosophy somewhat reluctantly. JNI is the official and most commonly-used API for invoking “native” code, but it is somewhat hard to use and few people actually use it. In contrast, for example, the Ruby world has always embraced this approach. Ruby has an easy to use API for invoking native C code and good alternatives exist for invoking code in other languages. As a result, many of the 3rd-party Ruby libraries (or gems) contain both Ruby and native C code. The latter is built on the fly when you install the gem. Hence, there are many high-performance Ruby applications. This is not a contradiction in terms, because the performance-critical sections run natively, even though interpreted Ruby is relatively slow.

Of course, you have to be judicious in how you use mixed-language programming. Crossing the language boundary is often somewhat heavyweight, so you should avoid doing such invocations inside tight loops, for example.

So, I think the solution to the dilemma of needing high performance sometimes and high productivity the rest of the time is to pick the right tools for each circumstance and make them interoperate. Even constrained embedded devices like cell phones would be easier to implement if most of the code were written in a language like Ruby, Python, Smalltalk, or Java and performance-critical components were written in C or C++.

If I were starting such a greenfield project, I would assume that time-to-money is the top priority and write most of my code in Ruby (my personal current favorite), using TDD of course. I would profile it constantly, as part of the nightly or continuous-integration build. When bottlenecks emerge, I would first determine if a refactoring is sufficient to fix them and if not, I would rewrite the critical sections in C. If the project were for an embedded device, I would also watch the resource usage carefully.

For my embedded device, I would test from the beginning whether or not the overhead of the interpreter/VM and the overall performance are acceptable. I would also be sure that I have adequate tool support for the inevitable remote debugging and diagnostics I’ll have to do. If I made the wrong tool choices after all, I would know early on, when it’s still relatively painless to retool.

If you’re an IT or web-site developer, you have fewer performance limitations and more options. You might decide to make the cross-language boundary a cross-process boundary, e.g., by communicating through some sort of lightweight web services. This is one way to leverage legacy C/C++ code while developing new functionality in a more productive language.

AOP and Dynamic Languages: Contradiction in Terms or Match Made in Heaven?

Posted by Dean Wampler Wed, 21 Mar 2007 18:21:35 GMT

Consider this quote from Dave Thomas (of the Pragmatic Programmers) on AOP (aspect-oriented programming, a.k.a. aspect-oriented software development - AOSD) :

Once you have decent reflection and metaclasses, AOP is just part of the language.

People who work with dynamic languages don't see any need for AOP-specific facilities in their language. They don't necessarily dispute the value of AOP for Java, where metaprogramming facilities are weaker, but for them, AOP is just a constrained form of metaprogramming. Are they right?

It's easy to see why people feel this way when you consider that most of the applications of AOP have been to solve obvious "cross-cutting concerns" (CCCs) like object-relational mapping, transactions, security, etc. In other words, AOP looks like just one of many tools in your toolbox to solve a particular group of problems.

I'll use Ruby as my example dynamic language, since Ruby is the example I know best. It's interesting to look at Ruby on Rails source code, where you find a lot of "AOP-like" code that addresses the CCCs I just mentioned (and more). This is easy enough to do using Ruby's metaprogramming tools, even though tooling that supports AOP semantics would probably make this code easier to write and maintain.

This is going to be a long blog entry already, so I won't cite detailed examples here, but consider how Rails uses method_missing to effectively "introduce" new methods into classes and modules. For example, in ActiveRecord, the many possible find methods and attribute read/write methods are "implemented" this way.

By the way, another excellent Ruby framework, RSpec used method_missing for similar purposes, but recently refactored its implementation and public API to avoid method_missing, because having multiple frameworks attempt to use the same "hook" proved very fragile!

Also in Rails, method "aliasing" is done approximately 175 times, often to wrap ("advise") methods with new behaviors.

Still, is there really a need for AOP tooling in dynamic languages? First, consider that in the early days of OOP, some of us "faked" OOP using whatever constructs our languages provided. I wrote plenty of C code that used structs as objects and method pointers to simulate method overloading and overriding. However, few people would argue today that such an approach is "good enough". If we're thinking in objects, it sure helps to have a language that matches those semantics.

Similarly, it's true that you can implement AOP using sufficiently powerful metaprogramming facilities, but it's a lot harder than having native AOP semantics in your language (or at least a close approximation thereof in libraries and their DSLs).

Before proceeding, let me remind you what AOP is for in the first place. AOP is essentially a new approach to modularization that complements other approaches, like objects. It tries to solve a group of problems that other modularity approaches can't handle, namely the fine-grained interaction of multiple domain models that is required to implement required functionality.

Take the classic example of security management. Presumably, you have one strategy and implementation for handling authentication and authorization. This is one domain and your application's "core business logic" is another domain.

In a non-AOP system, it is necessary to insert duplicate or nearly duplicate code throughout the system that invokes the security subsystem. This duplication violates DRY, it clutters the logic of the code where it is inserted, it is difficult to test, maintain, replace with a new implementation, etc.

Now you may say that you handle this through a Spring XML configuration file or an EJB deployment configuration file, for example. Congratulations, you are using an AOP or AOP-like system!

What AOP seeks to do is to allow you to specify that repeated behavior in one place, in a modular way.

There are four pieces required for an AOP system:

1. Interception

You need to be able to "intercept" execution points in the program. We call these join points in the AOP jargon and sets of them that the aspect writer wants to work with at once are called pointcuts (yes, no whitespace). At each join point, advice is the executable code that an aspect invokes either before, after or both before and after ("around") the join point.

Note that the most powerful AOP language, AspectJ, let's you advise join points like instance variable reads and writes, class initialization, instance creation, etc. The easiest join points to advise are method calls and many AOP systems limit themselves to this capability.

2. Introduction

Introduction is the ability to add new state and behavior to an existing class, object, etc. For example, if you want to use the Observer pattern with a particular class, you could use an aspect to introduce the logic to maintain the list of observers and to notify them when state changes occur.

3. Inspection

We need to be able to find the join points of interest, either through static or runtime analysis, preferably both! You would also like to specify certain conditions of interest, which I'll discuss shortly.

4. Modularization

If we can't package all this into a "module", then we don't have a new modularization scheme. Note that a part of this modularization is the ability to somehow specify in one place the behavior I want and have it affect the entire system. Hence, AOP is a modularity system with nonlocal effects.


Okay. How does pure Ruby stack up these requirements? If you're a Java programmer, the idea of Interception and Introduction, where you add new state and behavior to a class, may seem radical. In languages with "open classes" like Ruby, it is trivial and common to reopen a class (or Module) and insert new attributes (state) and methods (behavior). You can even change previously defined methods. Hence, Interception and Introduction are trivial in Ruby.

This is why Ruby programmers assume that AOP is nothing special, but what they are missing are the complete picture for Inspection and Modularization, even though both are partially covered.

There is a rich reflection API for finding classes and objects. You can write straightforward code that searches for classes that "respond to" a particular method, for example. What you can't do easily is query based on state. For example, in AspectJ, you can say, "I want to advise method call X.m when it is called in the context flow ('cflow') of method call Y.m2 somewhere up the stack..." Yes, you can figure out how to do this in Ruby, but it's hard. So, we're back to the argument I made earlier that you would really like your language to match the semantics of your ideas.

For modularization, yes you can put all the aspect-like code in a Module or Class. The hard part is encapsulating any complicated "point cut" metaprogramming in one place, should you want to use it again later. That is, once you figure out how to do the cflow pointcuts using metaprogramming, you'll want that tricky bit of code in a library somewhere.

At this point, you might be saying to yourself, "Okay, so it might be nice to have some AOP stuff in Ruby, but the Rails guys seem to be doing okay without it. Is it really worth the trouble having AOP in the language?" Only if AOP is more applicable than for the limited set of problems described previously.

Future Applications of AOP??

Here's what I've been thinking about lately. Ruby is a wonderful language for creating mini-DSLs. The ActiveRecord DSL is a good example. It provides relational semantics, while the library minimizes the coding required by the developer. (AR reads the database schema and builds an in-memory representation of the records as objects.)

Similarly, there is a lot of emphasis these days on development that centers around the domain or features of the project. Recall that I said that AOP is about modularizing the intersection of multiple domains (and recall my previous blog on the AOSD 2007 Conference where Gerald Sussman remarked that successful systems have more than one organizing principle).

I think we'll see AOP become the underlying implementation of powerful DSLs that allow programmers who are not AOP-literate express cross-cutting concerns in domain-specific and intuitive languages. AOP will do the heavy lifting behind the scenes to make the fine-grained interactions work. I really don't expect a majority of developers to become AOP literate any time soon. In my experience, too many so-called developers don't get objects. They'll never master aspects!

Shameless Plug

If you would like to hear more of my ideas about AOP in Ruby and aspect-oriented design (AOD), please come to my talk at SD West, this Friday at 3:30. I'm also giving a full-day tutorial on AOD in Ruby and Java/AspectJ at the ICSE 2007 conference in Minneapolis, May 26th.

Liskov Substitution Principle and the Ruby Core Libraries 7

Posted by Dean Wampler Sat, 17 Feb 2007 20:20:00 GMT

There is a spirited discussion happening now on the ruby-talk list called Oppinions on RCR for dup on immutable classes (sic).

In the core Ruby classes, the Kernel module, which is the root of everything, even Object, defines a method called dup, for duplicating objects. (There is also a clone method with slightly different behavior that I won’t discuss here.)

The problem is that some derived core classes throw an exception when dup is called.

Specifically, as the ruby-talk discussion title says, it’s the immutable classes (NilClass, FalseClass, TrueClass, Fixnum, and Symbol) that do this. Consider, for example, the following irb session:
irb 1:0> 5.respond_to? :dup
=> true
irb 2:0> 5.dup
TypeError: can't dup Fixnum
        from (irb):1:in `dup'
        from (irb):1
irb 3:0> 
If you don’t know Ruby, the first line asks the Fixnum object 5 if it responds to the method dup (with the name expressed as a symbol, hence the ”:”). The answer is true, becuase this method is defined by the module Kernel, which is included by the top-level class Object, an ancestor of Fixnum.

However, when you actually call dup on 5, it raises TypeError, as shown.

So, this looks like a classic Liskov Substitution Principle violation. The term for this code smell is Refused Bequest (e.g., see here) and it’s typically fixed with the refactoring Replace Inheritance with Delegation.

The email thread is about a proposal to change the library in one of several possible ways. One possibility is to remove dup from the immutable classes. This would eliminate the unexpected behavior in the example above, since 5.respond_to?(:dup) would return false, but it would still be an LSP violation, specifically it would still have the Refused Bequest smell.

One scenario where the current behavior causes problems is doing a deep copy of an arbitrary object graph. For immutable objects, you would normally just want dup to return the same object. It’s immutable, right? Well, not exactly, since you can re-open classes and even objects to add and remove methods in Ruby (there are some limitations for the immutables…). So, if you thought you actually duplicated something and started messing with its methods, you would be surprised to find the original was “also” modified.

So, how serious is this LSP issue (one of several)? When I pointed out the problem in the discussion, one respondent, Robert Dober, said the following (edited slightly):

I would say that LSP does not apply here simply because in Ruby we do not have that kind of contract. In order to apply LSP we need to say at a point we have an object of class Base, for example. (let the gods forgive me that I use Java)

void aMethod(final Base b){
   ....
}
and we expect this to work whenever we call aMethod with an object that is a Base. Anyway the compiler would not really allow otherwise.
SubClass sc;  // subclassing Base od course
aMethod( sc ); // this is expected to work (from the type POV).

Such things just do not exist in Ruby, I believe that Ruby has explained something to me:

  • OO Languages are Class oriented languages
  • Dynamic Languages are Object oriented languages.

Replace Class with Type and you see what I mean.

This is all very much IMHO of course but I feel that the Ruby community has made me evolve a lot away from “Class oriented”.

He’s wrong that the compiler protects you in Java; you can still throw exceptions, etc. The JDK Collection classes have Refused Bequests. Besides that, however, he makes some interesting points.

As a long-time Java programmer, I’m instinctively uncomfortable with LSP violations. Yet, the Ruby API is very nice to work with, so maybe a little LSP violation isn’t so bad?

As Robert says, we approach our designs differently in dynamic vs. static languages. In Ruby, you almost never use the is_a? and kind_of? methods to check for type. Instead, you follow the duck typing philosophy (“If it acts like a duck, it must be a duck”); you rely on respond_to? to decide if an object does what you want.

In the case of dup for the immutable classes, I would prefer that they not implement the method, rather than throw an exception. However, that would still violate LSP.

So, can we still satisfy LSP and also have rich base classes and modules?

There are many examples of traits that one object might or should support, but not another. (Those of you Java programmers might ask yourself why all objects support toString, for example. Why not also toXML...?)

Coming from an AOP background, I would rather see an architecture where dup is added only to those classes and modules that can support it. It shouldn’t be part of the standard “signature” of Kernel, but it should be present when code actually needs it.

In fact, Ruby makes this sort of AOP easy to implement. Maybe Kernel, Module, and Object should be refactored into smaller pieces and programmers should declaratively mixin the traits they need. Imagine something like the following:
irb 1:0> my_obj.respond_to? :dup
=> false
irb 2:0> include 'DupableTrait'  
irb 2:0> my_obj.respond_to? :dup
=> true
irb 4:0> def dup_if_possible items
irb 5:1>  items.map {|item| item.respond_to?(:dup) ? item.dup : item}
irb 6:1> end
...
In other words, Kernel no longer “exposes the dup abstraction”, by default, but the DupableTrait module “magically” adds dup to all the classes that can support it. This way, we preserve LSP, streamline the core classes and modules (SRP and ISP anyone?), yet we have the flexibility we need, on demand.

Protecting Developers from Powerful Languages 7

Posted by Dean Wampler Tue, 16 Jan 2007 03:57:00 GMT

Microsoft’s forthcoming C# version 3 has some innovative features, as described in this blog. I give the C# team credit for pushing the boundaries of C#, in part because they have forced the Java community to follow suit. ;)

A common tension in many development shops is how far to trust the developers with languages and tools that are perceived to be “advanced”. It’s tempting to limit developers to “safe” languages and maybe not all the features of those languages. This can be misguided.

Java is usually considered safe, but Java Generics are suspect. Strong typing is safe, but dynamic typing isn’t controlled enough. Closures and continuations sound too advanced and technical to be trusted in the hands of “our team”.

To be fair, larger organizations have more at stake and caution is prudent. Regrettably, it is also true that many people in our profession are … hmm … not that well qualified.

However, I find that I’m far more productive and less likely to make mistakes using Ruby iterators with closures than writing more verbose and inelegant Java.

I used to be a strong believer in static typing, but it has become a distraction, as I have to worry more about the types of method parameters and return values, rather than just worrying about the values themselves. I realized that, on average in a typical section of code, the actual type of a variable is unimportant. The variable is just a “handle” being passed around. The name is always important, as it is a form of documentation. There are places where the type is important, of course, when the variable is read or written in some way.

Finally, static typing offers less security than at first appears. At best, it only confirms that variables of particular types are used consistently. Your unit tests also do this. However, static typing can’t confirm that the usage of the API is correct. This is analogous to testing the syntax but not the semantics of the program. In fact, only unit tests (or alternatives, like rspec ) are effective at testing both.

So, it’s prudent to be reticent about newer languages and features, but make sure the decisions you make about them are backed up by careful evaluation and don’t forget to train your team appropriately!