Premature Optimization? 13

Posted by Brett Schuchert Tue, 03 Jun 2008 22:46:00 GMT

I have a theory.

It’s one of those theories that I don’t want to get ruined with actual facts, but I suppose I’ll put it out there and see what I learn.

The theory stems from a series of observations about the way some Microsoft development shops write code versus others. I’ve noticed that some shops like to make many things sealed (final). I’m not talking about the occasional class, but rather the default is to use sealed on most things. In addition to classes being sealed, there’s a heavy bias against virtual methods. I’ve not noticed this with Java development groups, though in the spirit of honest disclosure, I’ve only noticed thins tendency is about 50% of the MS shops I’ve visited – and it seems like the tendency is reducing.

Even so, I’ve wondered about this quite a bit. Here are some of the reasons I’ve heard to justify this practice:
  • It expresses the design intent
  • It’s for performance
  • The code as written cannot be safely overridden.

(Before I go any further, I am not arguing against the use of sealed or final. I will say that in my practice, it has to be justified to be used rather than the other way around.)

Back to the reasons. I generally don’t buy these reasons. Each may be perfectly valid but none of them stand on their own; each requires further discussion.

I’ve had those discussions with various people and most of the time they really don’t hold water. Here are some quip-y responses to each of them:
  • It expresses design intent – you don’t know how to design
  • It’s for performance – do you have any idea how a VM works?
  • The code as written cannot be safely overridden – So the code is so badly written that nobody understands it?

Again, let me be clear, there are good reasons to seal things. My point is that while there are good reasons, most of the time I’ve seen the habit is a form of cargo culting; monkey see, monkey do.

This observation sat around for some time until I read some stuff in a book called CLR via C# (good reading if you want to better understand the CLR, and if you’re developing code for a virtual machine, I think it’s a good idea to understand the virtual machine at least a little).

Reading that book and my observations led to my theory:
People are sealing as much as they are because virtual table binding is done way to early.

The thing I read in the book I mentioned above relates to when virtual methods are bound. This applies to 2.0 version of the CLR, but I don’t think there’s any change for the latest version of the CLR and this seems like a pretty “hard” design decision. This book mentions that binding of virtual methods is performed at compile time.

I’m not making a distinction between compile time and link time for this discussion, but I’ll be a little more precise. The generated code stored in an assembly “knows” which virtual function it is invoking because in the generated byte code is an index into a virtual table.

How is this a problem?

I write a base class and ship it. You write a derived class (OK, maybe you use delegation, good for you, but the problem still remains). Your coupling to my class involves using a method with the virtual keyword (Java programmers, you don’t have to do this, regular methods [not static, not a constructor, ...] have the potential for dynamic dispatch unless you declare them final). At compile time, your object module (which eventually resides in an assembly that will get loaded and executed some time later) knows that it wants to call the 3rd virtual method in the v-table.

So far, so good.

Next, I add a new virtual method to my class and ship a new assembly.

Question. To use this new assembly, do you need to recompile?

Yes.

What?

Yes. What happens if my newly added method is the first virtual method? Then your compiled code that is happily living in a peaceful assembly somewhere has a stale reference. It thinks it wants to run the 3rd method in the virtual table but now it really should be using the 4th virtual method.

This is similar to a caching issue or premature optimization. The calling code has a compile-time generated index into the v-table. Why is this? Is it for performance? Not a good argument, this pre-calculation is irrelevant with modern JIT compiling techniques. I think the reason it is done this way is that C++ does things this way. C++ needs to because of its execution model (there’s not virtual machine between the code and its execution).

No big deal, you think, I’ll just recompile my code.

That will fix the problem. Or will it?

Say you’re stuck using an old version of some assembly that uses an old version of another assembly that you also use. Do you recompile both your code and that other assembly? You’ll have to but you might forget. Or you might not be able to.

.Net handles this by having an excellent dependency system to track which versions of which assemblies work with with other versions of assemblies. They need this to address problems introduced by the decision to perform such early binding.

Simply put, this is an unfortunate decision that I believe was more based on precedence or history than on sound design decisions.

This is not what happens in the Java Virtual Machine (JVM).

In the JVM, the binding is done later—much later. It is possible that my class got loaded and JIT’ed so that all of the methods that could be virtually invoked (not final) are in fact not virtually invoked. So far, no subclasses have been loaded, so there’s been no need. After my system has been running for a few weeks, a new class (a subclass) gets loaded and invalidates the JIT’d version of my class. No problem, the system will dump the old version and JIT it again, on the fly.

Even if there is a need for virtual dispatch, it might be that most of the time I use one version of method for a particularly heavily used subclass. The JVM might inline the common case and virtually dispatch the others.

Bottom line: what’s being done with JVM’s is indistinguishable from black magic for most of us.

I can use new JAR files and not have to worry about recompiling to remove stale v-table references. The method binding will just work. (This does not remove all problems with changing versions of JAR’s but it gets rid of one persnickety one.)

But it goes deeper. As I mentioned above, I’ve noticed several Microsoft shops that use a lot of sealed classes or avoid using virtual methods because “they perform poorly”.

OK, let’s talk about method invocation for just a second (or make that pico second). On my machine, I timed virtual method dispatch. I should check my numbers, but what I got was that virtual method dispatch was taking around 450 pico-seconds. So you can invoke roughly 2.2 Billion methods per second.

So:
  • Virtual method dispatch is not that expensive
  • The JVM can address caching issues with code analysis and inlining
  • You only pay the price if your design requires it
  • The people writing the JVM are better and micro optimization than certainly me, and I’d say most people
OK, but so it doesn’t help with performance. Is there a benefit to leaving things unsealed and using more virtual methods?
  • Testability
  • Maintainability
  • Flexibility
  • Extensibility

Look up the percentage of time spent “building” a system versus “maintaining” a system. The percentage has been on the rise, and it’s 80 – 95% typically. 80% of the cost of development is spent maintaining code. (I believe a lot of this percentage has to do with the definition of “done” – but that’s a whole other discussion).

You might think that locking things down make them more stable and that stability leads to maintainability.

That’s not really the case.

Things change. Requirements change.

Locking down a class to keep it from breaking is like signing off on requirements to keep them from changing. Neither thing is really a good idea.

I CAN lock down my classes in a sense. I can use tests to describe my assumptions so the class is locked down so long as we all agree that we’ll keep the tests passing. That is a fine level of granularity.

But I digress.

Back to the whole sealed thing. I was wondering why so many places that use Microsoft stuff seal their classes and do not use virtual methods (in my experience, it’s about 50%). I then read that thing about the binding of virtual methods at compile time. And then about 2 weeks later it hit me.

If you use virtual methods and you have a messed up build system, then you’ll get strange behavior. Sealing things makes those surprising behaviors go away, so seal stuff.

It makes sense. This is something that C++ was notorious about. You hoped you’d get a segmentation violation but often the wrong method could get called and the system continued to run…

Unfortunately, the “solution” – sealing – is attacking a symptom, not the actual problem. This is where following the 5-why’s of Toyota would have been a good idea.

“Seal your classes” Why?
  • Because they will be more stable. More safe/it expresses my design. Why?
  • Because my design says that this should not change. Why?
  • Because when things change, sometimes “bad things happen.” Why?
  • I don’t know…

And right there is where you know the solution is a bad idea.

In reality I suspect the habit perpetuates because once something appears to work, it is cut and pasted ad nauseam.

Should you never seal things?

Never say never.

Or better yet, never say “never, never, never.”

There are three levels of never:
  • Never: Don’t do it because you may not know what you are doing. (Once you know the rules, you can do it.)
  • Never, Never: You know what you’re doing most of the time, you are doing a bad thing but you really do know better.
  • Never, Never, Never: DON’T DO IT!

There are few things that are Never, Never, Never. Dereferencing NULL comes to mind (but then maybe I waned a core dump if I get to this point in the code).

I’d place using sealed/final in the Never, Never camp if I’m using tests to describe the semantics of my classes. OR, if I’m writing concurrent code, using final on fields provides certain guarantees…

However, let’s say that I’ve been told not to use tests (I’ve seen it at more than one place). Then that “never, never” becomes a “sure do it to cover my ass”. Someone can’t subclass and mess things up. Someone can’t forget to compile their code when mine changes and cause their code to break…

Now is this a reason to avoid using Microsoft products? No. I really do like C# (for a statically-typed language that is). And I’m convinced that Java is a better language because of the work done by and in C#. And let’s face it, having more VM’s is a good thing. It improves competition. It raises the level at which we can expect to work.

But to quote Tim Booth/James in a rather trite way, “there’s a chain of consequence within, without.” That one thing, binding to the v-table so early, has caused a chain of events that leads to a platform that seems a little bit more sluggish to develop in to me. (I expect flames for this.)

I think this design decision is a reflection of a mind-set that permeates the development environment. For example, if I have a large solution with several projects, it sure seems that dependency checking in the build environment takes a long time. When I watch Developer Studio build things, it looks like make is running under the covers and performing a bunch of file checks to see what has changed. Where’s the incremental compilation?

I’ve seen very good work in both .Net and Java development efforts. It just seems that the frequency of unnecessary, basic kinds of environmental design debt occur more frequently in a .NET development effort than in similar Java development efforts.

Maybe it’s a sampling effect. To paraphrase Weinberg, “there’s always a sample, be aware of it, you can’t remove it.” I’m a consultant, I go places where they need consultants. I never, never go places that don’t need consultants, so my sample set is biased (I have one experience, maybe two, where I went and the people did not need consultants because they were doing fine – or maybe they didn’t need technically-oriented consultants).

So what do you think?
  • Is using sealed by default a good design choice?
  • Is the binding of methods early a good thing?
    • Was the decision by design or because of history?
  • Has/will the CLR change?

Unit Testing C and C++ ... with Ruby and RSpec! 12

Posted by Dean Wampler Mon, 04 Feb 2008 22:08:00 GMT

If you’re writing C/C++ code, it’s natural to write your unit tests in the same language (or use C++ for your C test code). All the well-known unit testing tools take this approach.

I think we can agree that neither language offers the best developer productivity among all the language choices out there. Most of us use either language because of perceived performance requirements, institutional and industry tradition, etc.

There’s growing interest, however, in mixing languages, tools, and paradigms to get the best tool for a particular job. <shameless-plug>I’m giving a talk March 7th at SD West on this very topic, called Polyglot and Poly-Paradigm Programming </shameless-plug>.

So, why not use a more productive language for your C or C++ unit tests? You have more freedom in your development chores than what’s required for production. Why not use Ruby’s RSpec, a Behavior-Driven Development tool for acceptance and unit testing? Or, you could use Ruby’s version of JUnit, called Test::Unit. The hard part is integrating Ruby and C/C++. If you’ve been looking for an excuse to bring Ruby (or Tcl or Python or Java or…) into your C/C++ environment, starting with development tasks is usually the path of least resistance.

I did some experimenting over the last few days to integrate RSpec using SWIG (Simplified Wrapper and Interface Generator), a tool for bridging libraries written in C and C++ to other languages, like Ruby. The Ruby section of the SWIG manual was very helpful.

My Proof-of-Concept Code

Here is a zip file of my experiment: rspec_for_cpp.zip

This is far from a complete and working solution, but I think it shows promise. See the Current Limitations section below for details.

Unzip the file into a directory. I’ll assume you named it rspec_for_cpp. You need to have gmake, gcc, SWIG and Ruby installed, along with the RSpec “gem”. Right now, it only builds on OS X and Linux (at least the configurations on my machines running those OS’s – see the discussion below). To run the build, use the following commands:

    
        $ cd rspec_for_cpp/cpp
        $ make 
    

You should see it finish with the lines

    
        ( cd ../spec; spec *_spec.rb )
        .........

        Finished in 0.0***** seconds

        9 examples, 0 failures
    

Congratulations, you’ve just tested some C and C++ code with RSpec! (Or, if you didn’t succeed, see the notes in the Makefile and the discussion below.)

The Details

I’ll briefly walk you through the files in the zip and the key steps required to make it all work.

cexample.h

Here is a simple C header file.

    
        /* cexample.h */
        #ifndef CEXAMPLE_H
        #define CEXAMPLE_H
        #ifdef __cplusplus
         extern "C" {
        #endif
        char* returnString(char* input);
        double returnDouble(int input);
        void  doNothing();

        #ifdef __cplusplus
         }
        #endif
        #endif
    

Of course, in a pure C shop, you won’t need the #ifdef __cplusplus stuff. I found this was essential in my experiment when I mixed C and C++, as you might expect.

cpp/cexample.c

Here is the corresponding C source file.

    
        /* cexample.h */

        char* returnString(char* input) {
            return input;
        }

        double returnDouble(int input) {
            return (double) input;
        }

        void  doNothing() {}
    

cpp/CppExample.h

Here is a C++ header file.

    
        #ifndef CPPEXAMPLE_H
        #define CPPEXAMPLE_H

        #include <string>

        class CppExample 
        {
        public:
            CppExample();
            CppExample(const CppExample& foo);
            CppExample(const char* title, int flag);
            virtual ~CppExample();

            const char* title() const;
            void        title(const char* title);
            int         flag() const;
            void        flag(int value);

            static int countOfCppExamples();
        private:
            std::string _title;
            int         _flag;
        };

        #endif
    

cpp/CppExample.cpp

Here is the corresponding C++ source file.

    
        #include "CppExample.h" 

        CppExample::CppExample() : _title("") {}
        CppExample::CppExample(const CppExample& foo): _title(foo._title) {}
        CppExample::CppExample(const char* title, int flag) : _title(title), _flag(flag) {}
        CppExample::~CppExample() {}

        const char* CppExample::title() const { return _title.c_str(); }
        void        CppExample::title(const char* name) { _title = name; }

        int  CppExample::flag() const { return _flag; }
        void CppExample::flag(int value) { _flag = value; }

        int CppExample::countOfCppExamples() { return 1; }
    

cpp/example.i

Typically in SWIG, you specify a .i file to the swig command to define the module that wraps the classes and global functions, which classes and functions to expose to the target language (usually all in our case), and other assorted customization options, which are discussed in the SWIG manual. I’ll show the swig command in a minute. For now, note that I’m going to generate an example_wrap.cpp file that will function as the bridge between the languages.

Here’s my example.i, where I named the module example.

    
        %module example
        %{
            #include "cexample.h" 
            #include "CppExample.h"    
        %}
        %include "cexample.h" 
        %include "CppExample.h" 
    

It looks odd to have header files appear twice. The code inside the %{...%} (with a ’#’ before each include) are standard C and C++ statements, etc. that will be inserted verbatim into the generated “wrapper” file, example_wrap.cpp, so that file will compile when it references anything declared in the header files. The second case, with a ’%’ before the include statements1, tells SWIG to make all the declarations in those header files available to the target language. (You can be more selective, if you prefer…)

Following Ruby conventions, the Ruby plugin for SWIG automatically names the module with an upper case first letter (Example), but you use require 'example' to include it, as we’ll see shortly.

Building

See the cpp/Makefile for the gory details. In a nutshell, you run the swig command like this.

    
        swig -c++ -ruby -Wall -o example_wrap.cpp example.i
    

Next, you create a dynamically-linked library, as appropriate for your platform, so the Ruby interpreter can load the module dynamically when required. The Makefile can do this for Linux and OS X platforms. See the Ruby section of the SWIG manual for Windows specifics.

If you test-drive your code, which tends to drive you towards minimally-coupled “modules”, then you can keep your libraries and build times small, which will make the build and test cycle very fast!

spec/cexample_spec.rb and spec/cppexample_spec.rb

Finally, here are the RSpec files that exercise the C and C++ code. (Disclaimer: these aren’t the best spec files I’ve ever written. For one thing, they don’t exercise all the CppExample methods! So sue me… :)

    
        require File.dirname(__FILE__) + '/spec_helper'
        require 'example'

        describe "Example (C functions)" do
          it "should be a constant on Module" do
            Module.constants.should include('Example')
          end
          it "should have the methods defined in the C header file" do
            Example.methods.should include('returnString')
            Example.methods.should include('returnDouble')
            Example.methods.should include('doNothing')
          end
        end

        describe Example, ".returnString" do
          it "should return the input char * string as a Ruby string unchanged" do
            Example.returnString("bar!").should == "bar!" 
          end  
        end

        describe Example, ".returnDouble" do
          it "should return the input integer as a double" do
            Example.returnDouble(10).should == 10.0
          end
        end

        describe Example, ".doNothing" do
          it "should exist, but do nothing" do
            lambda { Example.doNothing }.should_not raise_error
          end
        end
    

and

    
    require File.dirname(__FILE__) + '/spec_helper'
    require 'example'

    describe Example::CppExample do
      it "should be a constant on module Example" do
        Example.constants.should include('CppExample')
      end
    end

    describe Example::CppExample, ".new" do
      it "should create a new object of type CppExample" do
        example = Example::CppExample.new("example1", 1)
        example.title.should == "example1" 
        example.flag.should  == 1
      end
    end

    describe Example::CppExample, "#title(value)" do
      it "should set the title" do
        example = Example::CppExample.new("example1", 1)
        example.title("title2")
        example.title.should == "title2" 
      end
    end

    describe Example::CppExample, "#flag(value)" do
      it "should set the flag" do
        example = Example::CppExample.new("example1", 1)
        example.flag(2)
        example.flag.should == 2
      end
    end
    

If you love RSpec like I do, this is a very compelling thing to see!

And now for the small print:

Current Limitations

As I said, this is just an experiment at this point. Volunteers to make this battle-ready would be most welcome!

General

The Example Makefile File

It Must Be Hand Edited for Each New or Renamed Source File

You’ve probably already solved this problem for your own make files. Just merge in the example Makefile to pick up the SWIG- and RSpec-related targets and rules.

It Only Knows How to Build Shared Libraries for Mac OS X and Linux (and Not Very Well)

Some definitions are probably unique to my OS X and Linux machines. Windows is not supported at all. However, this is also easy rectify. Start with the notes in the Makefile itself.

The module.i File Must Be Hand Edited for Each File Change

Since the format is simple, a make task could fill a template file with the changed list of source files during the build.

Better Automation

It should be straightforward to provide scripts, IDE/Editor shortcuts, etc. that automate some of the tasks of adding new methods and classes to your C and C++ code when you introduce them first in your “spec” files. (The true TDD way, of course.)

Specific Issues for C Code Testing

I don’t know of any other C-specific issues, so unit testing with Ruby is most viable today for C code. Although I haven’t experimented extensively, C functions and variables are easily mapped by SWIG to a Ruby module. The Ruby section of the SWIG manual discusses this mapping in some detail.

Specific Issues for C++ Code Testing

More work will be required to make this viable. It’s important to note that SWIG cannot handle all C++ constructs (although there are workarounds for most issues, if you’re committed to this approach…). For example, namespaces, nested classes, some template and some method overloading scenarios are not supported. The SWIG manual has details.

Also, during my experiment, SWIG didn’t seem to map const std::string& objects in C++ method signatures to Ruby strings, as I would have expected (char* worked fine).

Is It a Viable Approach?

Once the General issues listed above are handled, I think this approach would work very well for C code. For C++ code, there are more issues that need to be addressed, and programmers who are committed to this strategy will need to tolerate some issues (or just use C++-language tools for some scenarios).

Conclusions: Making It Development-Team Ready

I’d like to see this approach pushed to its logical limit. I think it has the potential to really improve the productivity of C and C++ developers and the quality of their test coverage, by leveraging the productivity and power of dynamically-typed languages like Ruby. If you prefer, you could use Tcl, Python, even Java instead.

License

This code is complete open and free to use. Of course, use it at your own risk; I offer it without warranty, etc., etc. When I polish it to the point of making it an “official” project, I will probably release under the Apache license.

1 I spent a lot of time debugging problems because I had a ’#’ where I should have had a ’%’! Caveat emptor!

Applications Should Use Several Languages 10

Posted by Dean Wampler Wed, 04 Jul 2007 11:38:31 GMT

Yesterday, I blogged about TDD in C++ and ended with a suggestion for the dilemma of needing optimal performance some of the time and optimal productivity the rest of the time. I suggested that you should use more than one language for your applications.

If you are developing web applications, you are already doing this, of course. Your web tier probably uses several “languages”, e.g., HTML, JavaScript, JSP/ASP, CSS, Java, etc.

However, most people use only one language for the business/mid tier. I think you should consider using several; a high-productivity language environment for most of your work, with the occasional critical functionality implemented in C or C++ to optimize performance, but only after actually measuring where the bottlenecks are located.

This approach is much too rare, but it has historical precedents. One of the most successful and long-lived software projects of all time is Emacs. It consists of a core C-based runtime with most of the functionality implemented in Emacs lisp “components”. The relative ease of extending Emacs using lisp has resulted in a rich assortment of support tools for various operating systems, languages, build tools, etc. Even modern IDEs and and other graphical editors have not completely displaced Emacs.

Java has embraced the mixed language philosophy somewhat reluctantly. JNI is the official and most commonly-used API for invoking “native” code, but it is somewhat hard to use and few people actually use it. In contrast, for example, the Ruby world has always embraced this approach. Ruby has an easy to use API for invoking native C code and good alternatives exist for invoking code in other languages. As a result, many of the 3rd-party Ruby libraries (or gems) contain both Ruby and native C code. The latter is built on the fly when you install the gem. Hence, there are many high-performance Ruby applications. This is not a contradiction in terms, because the performance-critical sections run natively, even though interpreted Ruby is relatively slow.

Of course, you have to be judicious in how you use mixed-language programming. Crossing the language boundary is often somewhat heavyweight, so you should avoid doing such invocations inside tight loops, for example.

So, I think the solution to the dilemma of needing high performance sometimes and high productivity the rest of the time is to pick the right tools for each circumstance and make them interoperate. Even constrained embedded devices like cell phones would be easier to implement if most of the code were written in a language like Ruby, Python, Smalltalk, or Java and performance-critical components were written in C or C++.

If I were starting such a greenfield project, I would assume that time-to-money is the top priority and write most of my code in Ruby (my personal current favorite), using TDD of course. I would profile it constantly, as part of the nightly or continuous-integration build. When bottlenecks emerge, I would first determine if a refactoring is sufficient to fix them and if not, I would rewrite the critical sections in C. If the project were for an embedded device, I would also watch the resource usage carefully.

For my embedded device, I would test from the beginning whether or not the overhead of the interpreter/VM and the overall performance are acceptable. I would also be sure that I have adequate tool support for the inevitable remote debugging and diagnostics I’ll have to do. If I made the wrong tool choices after all, I would know early on, when it’s still relatively painless to retool.

If you’re an IT or web-site developer, you have fewer performance limitations and more options. You might decide to make the cross-language boundary a cross-process boundary, e.g., by communicating through some sort of lightweight web services. This is one way to leverage legacy C/C++ code while developing new functionality in a more productive language.

Observations on TDD in C++ (long) 19

Posted by Dean Wampler Tue, 03 Jul 2007 23:15:09 GMT

I spent all of June mentoring teams on TDD in C++ with some Java. While C++ was my language of choice through most of the 90’s, I think far too many teams are using it today when there are better options for their particular needs.

During the month, I took notes on all the ways that C++ development is less productive than development in languages like Java, particular if you try to practice TDD. I’m not trying to start a language flame war. There are times when C++ is the appropriate tool, as we’ll see.

Most of the points below have been discussed before, but it is useful to list them in one place and to highlight a few particular observations.

Based on my observations last month, as well as previously experience, I’ve come to the conclusion that TDD in C++ is about an order of magnitude slower than TDD in Java. Mostly, this is due to poor or non-existent tool support for automated refactorings, no error detection as you type, and the requirement to compile and link an executable test.

So, here is my list of impediments that I encountered last month. I’ll mostly use Java as the comparison language, but the arguments are more or less the same for C# and the popular dynamic languages, like Ruby, Python, and Smalltalk. Note that the dynamic languages tend to have less complete tool support, but they make up for it in other ways (off-topic for this blog).

Getting Started

There is more setup effort involved in configuring your build environment to use your chosen unit testing framework (e.g., CppUnit) and to create small executables, one each for a single or a few tests. Creating many small tests, rather than one big test (e.g., a variant of the actual application). This is important to minimize the TDD cycle.

Fortunately, this setup is a one-time “charge”. The harder part, if you have legacy code, is refactoring it to break hard dependencies so you can write unit tests. This is true for legacy code in any language, of course.

Complex Syntax

C++ has a very complex syntax. This makes it hard to parse, limiting the capabilities of automated tools and slowing build times (more below).

The syntax also makes it harder to program in the language and not just for novices. Even for experts, the visual noise of pointer and reference syntax obscures the story the code is trying to tell. That is, C++ code is inherently less clean than code in most other languages in widespread use.

Also, the need for the developer to remember whether each variable is a pointer, a reference, or a “value”, and how to manage its life-cycle, requires mental effort that could be applied to the logic of the code instead.

Obsolete Tool Support

No editor or IDE supports non-trivial, automated refactorings. (Some do simple refactorings like “rename”.) This means you have to resort to tedious, slow, and error-prone manual refactorings. Extract Method is made worse by the fact that you usually have to edit two files, an implementation and a header file.

There are no widely-used tools that provide on-the-fly parsing and error indications. This alone increases the time between typing an error and learning about it by an order of magnitude. Since a build is usually required, you tend to type a lot between builds, thereby learning about many errors at once. Working through them takes time. (There may be some commercial tools with limited support for on-the-fly parsing, but they are not widely used.)

Similarly, none of the common development tools support incremental loading of object code that could be used for faster unit testing and hence a faster TDD cycle. Most teams just build executables. Even when they structure the build process to generate small, focused executables for unit tests, the TDD cycle times remain much longer than for Java.

Finally, while there is at least one mocking framework available for C++, it is much harder to use than comparable frameworks in newer languages.

Manual Memory Management

We all know that manual memory management leads to time spent finding and fixing memory errors and leaks. Avoiding these problems in the first place also consumes a lot of thought and design effort. In Java, you just spend far less time thinking about “who owns this object and is therefore responsible for managing its life-cycle”.

Dependency Management

Intelligent handling of include directives is entirely up to the developer. We have all used the following “guard” idiom:

    #ifndef MY_CLASS_H
    #define MY_CLASS_H
    ...
    #endif

Unfortunately, this isn’t good enough. The file will still get opened and read in its entirety every time it is included. You could also put the guard directives around the include statement:

    #ifndef MY_CLASS_H
    #include "myclass.h"
    #endif

This is tedious and few people do it, but it does avoid the wasted file I/O.

Finally, too few people simply declare a required class with no body:

    class MyClass;

This is sufficient when one header references another class as a pointer or reference. In our experience with clients, we have often seen build times improve significantly when teams cleaned up their header file usage and dependencies, in general. Still, why is all this necessary in the 21st century?

This problem is made worse by the unfortunate inclusion of private and protected declarations in the same header file included by clients of the class. This creates phantom dependencies from the clients to class details that they can’t access directly.

Other Debugging Issues

Limited or non-existent context information when an exception is thrown makes the origin of the exception harder to find. To fill the gap, you tend to spend more time adding this information manually through logging statements in catch blocks, etc.

The std::exception class doesn’t appear to have a std::string or const char* argument in a constructor for a message. You could just throw a string, but that precludes using an exception class with a meaningful name.

Compiler error messages are hard to read and often misleading. In part this is due to the complexity of the syntax and the parsing problem mentioned previously. Errors involving template usage are particular hard to debug.

Reflection and Metaprogramming

Many of the productivity gains from using dynamic languages and (to a lesser extent) Java and C# are due to their reflection and metaprogramming facilities. C++ relies more on template metaprogramming, rather than APIs or other built-in language features that are easier to use and more full-featured. Preprocessor hacks are also used frequently. Better reflection and metaprogramming support would permit more robust proxy or aspect solutions to be used. (However, to be fair, sometimes a preprocessor hack has the virtue of being “the simplest thing that could possibly work.”)

Library Issues

Speaking of std::string and char*, it is hard to avoid writing two versions of methods, one which takes const std::string& arguments and one which takes const char* arguments. It doesn’t matter that one method can usually delegate to the other one; this is wasted effort.

Discussion

So, C++ makes it hard for me to work the way that I want to work today, which is test-driven, creating clean code that works. That’s why I rarely choose it for a project.

However, to be fair, there are legitimate reasons for almost all of the perceived “deficiencies” listed above. C++ emphasizes performance and backwards-compatibility with C over all other considerations. However, they come at the expense of other interests, like effective TDD.

It is a good thing that we have languages that were designed with performance as the top design goal, because there are circumstances where performance is the number one requirement. However, most teams that use C++ as their primary language are making an optimal choice for, say, 10% of their code, but which is suboptimal the other 90%. Your numbers will vary; I picked 10% vs. 90% based on the fact that performance bottlenecks are usually localized and they should be found by actual measurements, not guesses!

Workarounds

If it’s true that TDD is an order of magnitude slower for C++ then what do we do? No doubt really good C++ developers have optimized their processes as best as they can, but in the end, you will just have to live with longer TDD cycles. Instead of write just enough test to fail, make it pass, refactor, it will be more like write a complete test, write the implementation, build it, fix the compilation errors, run it, fix the logic errors to make the test pass, and then refactor.

A Real Resolution?

You could consider switching to the D language, which is link compatible with C and appears to avoid many of the problems described above.

There is another way out of the dilemma of needing optimal performance some of the time and optimal productivity the rest of the time; use more than one language. I’ll discuss this idea in my next blog.