Contracts and Integration Tests for Component Interfaces 17
I am mentoring a team that is transitioning to XP, the first team in a planned, corporate-wide transition. Recently we ran into miscommunication problems about an interface we are providing to another team.
The problems didn’t surface until a “big-bang” integration right before a major release, when it was too late to fix the problem. The feature was backed out of the release, as a result.
There are several lessons to take away from this experience and a few techniques for preventing these problems in the first place.
End-to-end automated integration tests are a well-established way of catching these problems early on. The team I’m mentoring has set up its own continuous-integration (CI) server and the team is getting pretty good at writing acceptance tests using FitNesse. However, these tests only cover the components provided by the team, not the true end-to-end user stories. So, they are imperfect as both acceptance tests and integration tests. Our longer-term goal is to automate true end-to-end acceptance and integration tests, across all components and services.
In this particular case, the other team is following a waterfall-style of development, with big design up front. Therefore, my team needed to give them an interface to design against, before we were ready to actually implement the service.
There are a couple of problems with this approach. First, the two teams should really “pair” to work out the interface and behavior across their components. As I said, we’re just starting to go Agile, but my goal is to have virtual feature teams, where members of the required component teams come together as needed to implement features. This would help prevent the miscommunication of one team defining an interface and sharing it with another team through documentation, etc. Getting people to communicate face-to-face and to write code together would minimize miscommunication.
Second, defining a service interface without the implementation is risky, because it’s very likely you will miss important details. The best way to work out the details of the interface is to test drive it in some way.
This suggests another technique I want to introduce to the team. When defining an interface for external consumption, don’t just deliver the “static” interface (source files, documentation, etc.), also deliver working Mock Objects that the other team can test against. You should develop these mocks as you test drive the interface, even if you aren’t yet working on the full implementation (for schedule or other reasons).
The mocks encapsulate and enforce the behavioral contract of the interface. Design by Contract is a very effective way of thinking about interface design and implementing automated enforcement of it. Test-driven development mostly serves the same practical function, but thinking in “contractual” terms brings clarity to tests that is often missing in many of the tests I see.
Many developers already use mocks for components that don’t exist yet and find that the mocks help them design the interfaces to those components, even while the mocks are being used to test clients of the components.
Of course, there is no guarantee that the mocks faithfully represent the actual behavior, but they will minimize surprises. Whether you have mocks or not, there is no substitute for running automated integration tests on real components as soon as possible.
So... You want your code to be maintainable. 37
We know that maintenance is 90% of the software lifecycle, and 90% of the cost. We know that our systems need to be flexible, reusable, and maintainable. Indeed, that’s why we spend so much of our time trying to get the design and architecture just right. Because we all know that good design and architecture is the key to flexibility, reusability, and maintainability…right?
Of course. Good design and architecture is what makes software easy to change. Good design and architecture separates the things that change for one reason from the things that change for another reason (The Single Responsibility Principle). Good design allows us to add new features without changing a lot of old code (Open Closed Principle). Good design makes sure that high level policy does not depend on low level detail (Dependency Inversion Principle), etc. etc.
So how do we get good design? Well, that’s tricky. Oh it’s not too tricky to get a good design in place at first. The tricky part is to keep the design good. That’s the problem, you see. It’s not that the design starts out so bad (although sometimes…) rather it is that the design degrades over time as the system changes.
Systems change. Often they change in ways that thwart the original intent of the design. Unfortunately, changing the design to align to these changes is hard. So we wind up hacking the new features into the system and thwarting the design. And that’s how even the best designed systems rot.
So how do we keep the design from rotting? How do we make sure we can migrate the design as the system changes? Simple. Tests.
When you have a suite of tests that covers >90% of the code in the system, you are not afraid to make changes. Every time you make a little change you run those tests, and you know that you have not broken anything. This gives you the confidence to make the next change, and the next, and the next. It gives you the confidence to change the design!
Nothing makes a system more flexible than a suite of tests. Nothing. Good architecture and design are important; but the affect of a robust suite of tests is an order of magnitude greater. It’s so much greater because those tests enable you to improve the design.
This can’t be overstated. If you want your systems to be flexible, write tests. If you want your systems to be reusable, write tests. If you want your systems to be maintainable, write tests.
And write your tests using the Three Laws of TDD.
The Quality of TDD 44
Kieth Braithwaite has made an interesting observation here. The basic idea is that code that has been written with TDD has a lower Cyclomatic Complexity per function compared to code that has not been written with TDD. If this is true then it could imply lower defects because of this.
Kieth’s metric takes in the code for an entire project and boils it down to a single number. His hypothesis is that a system written with TDD will always measure above a certain threshold, indicating very low CC; whereas systems written without TDD may or may not measure above that threshold.
Kieth has built a tool that you can get here that will generate this metric for most java projects. He and others have used this tool to measure many different systems. So far the hypothesis seems to hold water.
The metric can’t tell you if TDD was used; but it might just be able to tell you that it wasn’t used.
Unit Testing C and C++ ... with Ruby and RSpec! 105
If you’re writing C/C++ code, it’s natural to write your unit tests in the same language (or use C++ for your C test code). All the well-known unit testing tools take this approach.
I think we can agree that neither language offers the best developer productivity among all the language choices out there. Most of us use either language because of perceived performance requirements, institutional and industry tradition, etc.
There’s growing interest, however, in mixing languages, tools, and paradigms to get the best tool for a particular job. <shameless-plug>I’m giving a talk March 7th at SD West on this very topic, called Polyglot and Poly-Paradigm Programming </shameless-plug>.
So, why not use a more productive language for your C or C++ unit tests? You have more freedom in your development chores than what’s required for production. Why not use Ruby’s RSpec, a Behavior-Driven Development tool for acceptance and unit testing? Or, you could use Ruby’s version of JUnit, called Test::Unit. The hard part is integrating Ruby and C/C++. If you’ve been looking for an excuse to bring Ruby (or Tcl or Python or Java or…) into your C/C++ environment, starting with development tasks is usually the path of least resistance.
I did some experimenting over the last few days to integrate RSpec using SWIG (Simplified Wrapper and Interface Generator), a tool for bridging libraries written in C and C++ to other languages, like Ruby. The Ruby section of the SWIG manual was very helpful.
My Proof-of-Concept Code
Here is a zip file of my experiment: rspec_for_cpp.zip
This is far from a complete and working solution, but I think it shows promise. See the Current Limitations section below for details.
Unzip the file into a directory. I’ll assume you named it rspec_for_cpp
. You need to have gmake
, gcc
, SWIG and Ruby installed, along with the RSpec “gem”. Right now, it only builds on OS X and Linux (at least the configurations on my machines running those OS’s – see the discussion below). To run the build, use the following commands:
$ cd rspec_for_cpp/cpp
$ make
You should see it finish with the lines
( cd ../spec; spec *_spec.rb )
.........
Finished in 0.0***** seconds
9 examples, 0 failures
Congratulations, you’ve just tested some C and C++ code with RSpec! (Or, if you didn’t succeed, see the notes in the Makefile
and the discussion below.)
The Details
I’ll briefly walk you through the files in the zip and the key steps required to make it all work.
cexample.h
Here is a simple C header file.
/* cexample.h */
#ifndef CEXAMPLE_H
#define CEXAMPLE_H
#ifdef __cplusplus
extern "C" {
#endif
char* returnString(char* input);
double returnDouble(int input);
void doNothing();
#ifdef __cplusplus
}
#endif
#endif
Of course, in a pure C shop, you won’t need the #ifdef __cplusplus
stuff. I found this was essential in my experiment when I mixed C and C++, as you might expect.
cpp/cexample.c
Here is the corresponding C source file.
/* cexample.h */
char* returnString(char* input) {
return input;
}
double returnDouble(int input) {
return (double) input;
}
void doNothing() {}
cpp/CppExample.h
Here is a C++ header file.
#ifndef CPPEXAMPLE_H
#define CPPEXAMPLE_H
#include <string>
class CppExample
{
public:
CppExample();
CppExample(const CppExample& foo);
CppExample(const char* title, int flag);
virtual ~CppExample();
const char* title() const;
void title(const char* title);
int flag() const;
void flag(int value);
static int countOfCppExamples();
private:
std::string _title;
int _flag;
};
#endif
cpp/CppExample.cpp
Here is the corresponding C++ source file.
#include "CppExample.h"
CppExample::CppExample() : _title("") {}
CppExample::CppExample(const CppExample& foo): _title(foo._title) {}
CppExample::CppExample(const char* title, int flag) : _title(title), _flag(flag) {}
CppExample::~CppExample() {}
const char* CppExample::title() const { return _title.c_str(); }
void CppExample::title(const char* name) { _title = name; }
int CppExample::flag() const { return _flag; }
void CppExample::flag(int value) { _flag = value; }
int CppExample::countOfCppExamples() { return 1; }
cpp/example.i
Typically in SWIG, you specify a .i
file to the swig
command to define the module that wraps the classes and global functions, which classes and functions to expose to the target language (usually all in our case), and other assorted customization options, which are discussed in the SWIG manual. I’ll show the swig
command in a minute. For now, note that I’m going to generate an example_wrap.cpp
file that will function as the bridge between the languages.
Here’s my example.i
, where I named the module example
.
%module example
%{
#include "cexample.h"
#include "CppExample.h"
%}
%include "cexample.h"
%include "CppExample.h"
It looks odd to have header files appear twice. The code inside the %{...%}
(with a ’#’ before each include
) are standard C and C++ statements, etc. that will be inserted verbatim into the generated “wrapper” file, example_wrap.cpp
, so that file will compile when it references anything declared in the header files. The second case, with a ’%’ before the include
statements1, tells SWIG to make all the declarations in those header files available to the target language. (You can be more selective, if you prefer…)
Following Ruby conventions, the Ruby plugin for SWIG automatically names the module with an upper case first letter (Example
), but you use require 'example'
to include it, as we’ll see shortly.
Building
See the cpp/Makefile
for the gory details. In a nutshell, you run the swig
command like this.
swig -c++ -ruby -Wall -o example_wrap.cpp example.i
Next, you create a dynamically-linked library, as appropriate for your platform, so the Ruby interpreter can load the module dynamically when required. The Makefile
can do this for Linux and OS X platforms. See the Ruby section of the SWIG manual for Windows specifics.
If you test-drive your code, which tends to drive you towards minimally-coupled “modules”, then you can keep your libraries and build times small, which will make the build and test cycle very fast!
spec/cexample_spec.rb
and spec/cppexample_spec.rb
Finally, here are the RSpec files that exercise the C and C++ code. (Disclaimer: these aren’t the best spec files I’ve ever written. For one thing, they don’t exercise all the CppExample
methods! So sue me… :)
require File.dirname(__FILE__) + '/spec_helper'
require 'example'
describe "Example (C functions)" do
it "should be a constant on Module" do
Module.constants.should include('Example')
end
it "should have the methods defined in the C header file" do
Example.methods.should include('returnString')
Example.methods.should include('returnDouble')
Example.methods.should include('doNothing')
end
end
describe Example, ".returnString" do
it "should return the input char * string as a Ruby string unchanged" do
Example.returnString("bar!").should == "bar!"
end
end
describe Example, ".returnDouble" do
it "should return the input integer as a double" do
Example.returnDouble(10).should == 10.0
end
end
describe Example, ".doNothing" do
it "should exist, but do nothing" do
lambda { Example.doNothing }.should_not raise_error
end
end
and
require File.dirname(__FILE__) + '/spec_helper'
require 'example'
describe Example::CppExample do
it "should be a constant on module Example" do
Example.constants.should include('CppExample')
end
end
describe Example::CppExample, ".new" do
it "should create a new object of type CppExample" do
example = Example::CppExample.new("example1", 1)
example.title.should == "example1"
example.flag.should == 1
end
end
describe Example::CppExample, "#title(value)" do
it "should set the title" do
example = Example::CppExample.new("example1", 1)
example.title("title2")
example.title.should == "title2"
end
end
describe Example::CppExample, "#flag(value)" do
it "should set the flag" do
example = Example::CppExample.new("example1", 1)
example.flag(2)
example.flag.should == 2
end
end
If you love RSpec like I do, this is a very compelling thing to see!
And now for the small print:
Current Limitations
As I said, this is just an experiment at this point. Volunteers to make this battle-ready would be most welcome!
General
The Example Makefile
File
It Must Be Hand Edited for Each New or Renamed Source File
You’ve probably already solved this problem for your own make files. Just merge in the example Makefile
to pick up the SWIG- and RSpec-related targets and rules.
It Only Knows How to Build Shared Libraries for Mac OS X and Linux (and Not Very Well)
Some definitions are probably unique to my OS X and Linux machines. Windows is not supported at all. However, this is also easy rectify. Start with the notes in the Makefile
itself.
The module.i
File Must Be Hand Edited for Each File Change
Since the format is simple, a make task could fill a template file with the changed list of source files during the build.
Better Automation
It should be straightforward to provide scripts, IDE/Editor shortcuts, etc. that automate some of the tasks of adding new methods and classes to your C and C++ code when you introduce them first in your “spec” files. (The true TDD way, of course.)
Specific Issues for C Code Testing
I don’t know of any other C-specific issues, so unit testing with Ruby is most viable today for C code. Although I haven’t experimented extensively, C functions and variables are easily mapped by SWIG to a Ruby module. The Ruby section of the SWIG manual discusses this mapping in some detail.
Specific Issues for C++ Code Testing
More work will be required to make this viable. It’s important to note that SWIG cannot handle all C++ constructs (although there are workarounds for most issues, if you’re committed to this approach…). For example, namespaces, nested classes, some template and some method overloading scenarios are not supported. The SWIG manual has details.
Also, during my experiment, SWIG didn’t seem to map const std::string&
objects in C++ method signatures to Ruby strings, as I would have expected (char*
worked fine).
Is It a Viable Approach?
Once the General issues listed above are handled, I think this approach would work very well for C code. For C++ code, there are more issues that need to be addressed, and programmers who are committed to this strategy will need to tolerate some issues (or just use C++-language tools for some scenarios).
Conclusions: Making It Development-Team Ready
I’d like to see this approach pushed to its logical limit. I think it has the potential to really improve the productivity of C and C++ developers and the quality of their test coverage, by leveraging the productivity and power of dynamically-typed languages like Ruby. If you prefer, you could use Tcl, Python, even Java instead.
License
This code is complete open and free to use. Of course, use it at your own risk; I offer it without warranty, etc., etc. When I polish it to the point of making it an “official” project, I will probably release under the Apache license.
1 I spent a lot of time debugging problems because I had a ’#’ where I should have had a ’%’! Caveat emptor!
The Post-it® Notes Test for UML Diagrams 57
A lot of teams require their developers to document their designs in UML, using Visio or another tool, before they can start coding.
Of course, this is not at all Agile. For one thing, the design is likely to change quite a bit as you learn while coding. Hardly anyone returns to the diagrams and updates them. Now they are lies, because they make claims about the designs that aren’t true.
UML still has a place in agile projects, of course. It’s a great tool for brainstorming design ideas. So, how do you decide when a diagram is worth keeping and therefore, worth maintaining? Here’s a little strategy that I recommend.
Draw the diagram during those brainstorming sessions on a white board or a poster-sized Post-it® Note. Drawing it this way means you have invested almost no additional effort, beyond the brainstorming itself, to create the diagram. Also, you won’t feel bad about lost work if you eventually throw it away.
Leave the diagram on the wall for everyone to see while they implement the design.
By the time the note is falling off the wall or the dry-erase marker is wearing off the white board, you’ll know if the ideas are still relevant or completely obsolete.
If they are obsolete, you can erase the board or toss the paper. If they are still relevant, and probably changed somewhat, you now know that the diagram is worth preserving. Go ahead and spend the time to create an updated, more permanent version in your drawing tool (but don’t spend too much time!).
Generated Tests and TDD 66
TDD has become quite popular, and many companies are attempting to adopt it. However, some folks worry that it takes a long time to write all those unit tests and are looking to test-generation tools as a way to decrease that burden.
The burden is not insignificant. FitNesse, an application created using TDD, is comprised of 45,000 lines of Java code, 15,000 of which are unit tests. Simple math suggests that TDD increases the coding burden by a full third!
Of course this is a naive analysis. The benefits of using TDD are significant, and far outweigh the burden of writing the extra code. But that 33% still feels “extra” and tempts people to find ways to shrink it without losing any of the benefits.
Test Generators.
Some folks have put their hope in tools that automatically generate tests by inspecting code. These tools are very clever. They generate random calls to methods and remember the results. They can automatically build mocks and stubs to break the dependencies between modules. They use remarkably clever algorithms to choose their random test data. They even provide ways for programmers to write plugins that adjust those algorithms to be a better fit for their applications.
The end result of running such a tool is a set of observations. The tool observes how the instance variables of a class change when calls are made to its methods with certain arguments. It notes the return values, changes to instance variables, and outgoing calls to stubs and mocks. And it presents these observations to the user.
The user must look through these observations and determine which are correct, which are irrelevant, and which are bugs. Once the bugs are fixed, these observations can be checked over and over again by re-running the tests. This is very similar to the record-playback model used by GUI testers. Once you have registered all the correct observations, you can play the tests back and make sure those observations are still being observed.
Some of the tools will even write the observations as JUnit tests, so that you can run them as a standard test suite. Just like TDD, right? Well, not so fast…
Make no mistake, tools like this can be very useful. If you have a wad of untested legacy code, then generating a suite of JUnit tests that verifies some portion of the behavior of that code can be a great boon!
The Periphery Problem
On the other hand, no matter how clever the test generator is, the tests it generates will always be more naive than the tests that a human can write. As a simple example of this, I have tried to generate tests for the bowling game program using two of the better known test generation tools. The interface to the Bowling Game looks like this:
public class BowlingGame {
public void roll(int pins) {...}
public int score() {...}
}
The idea is that you call roll each time the balls gets rolled, and you call score at the end of the game to get the score for that game.
The test generators could not randomly generate valid games. It’s not hard to see why. A valid game is a sequence of between 12 and 21 rolls, all of which must be integers between 0 and 10. What’s more, within a given frame, the sum of rolls cannot exceed 10. These constraints are just too tight for a random generator to achieve within the current age of the universe.
I could have written a plugin that guided the generator to create valid games; but such an algorithm would embody much of the logic of the BowlingGame itself, so it’s not clear that the economics are advantageous.
To generalize this, the test generators have trouble getting inside algorithms that have any kind of protocol, calling sequence, or state semantics. They can generate tests around the periphery of the classes; but can’t get into the guts without help.
TDD?
The real question is whether or not such generated tests help you with Test Driven Development. TDD is the act of using tests as a way to drive the development of the system. You write unit test code first, and then you write the application code that makes that code pass. Clearly generating tests from existing code violates that simple rule. So in some philosophical sense, using test generators is not TDD. But who cares so long as the tests get written, right? Well, hang on…
One of the reasons that TDD works so well is that it is similar to the accounting practice of dual entry bookkeeping. Accountants make every entry twice; once on the credit side, and once on the debit side. These two entries follow separate mathematical pathways. In the end a magical subtraction yields a zero if all the entries were made correctly.
In TDD, programmers state their intent twice; once in the test code, and again in the production code. These two statements of intent verify each other. The tests, test the intent of the code, and the code tests the intent of the tests. This works because it is a human that makes both entries! The human must state the intent twice, but in two complementary forms. This vastly reduces many kinds of errors; as well as providing significant insight into improved design.
Using a test generator breaks this concept because the generator writes the test using the production code as input. The generated test is not a human restatement, it is an automatic translation. The human states intent only once, and therefore does not gain insights from restatement, nor does the generated test check that the intent of the code was achieved. It is true that the human must verify the observations, but compared to TDD that is a far more passive action, providing far less insight into defects, design and intent.
I conclude from this that automated test generation is neither equivalent to TDD, nor is it a way to make TDD more efficient. What you gain by trying to generate the 33% test code, you lose in defect elimination, restatement of intent, and design insight. You also sacrifice depth of test coverage, because of the periphery problem.
This does not mean that test generators aren’t useful. As I said earlier, I think they can help to partially characterize a large base of legacy code. But these tools are not TDD tools. The tests they generate are not equivalent to tests written using TDD. And many of the benefits of TDD are not achieved through test generation.
Active Record vs Objects 100
Active Record is a well known data persistence pattern. It has been adopted by Rails, Hibernate, and many other ORM tools. It has proven it’s usefulness over and over again. And yet I have a philosophical problem with it.
public class Employee extends ActiveRecord {
private String name;
private String address;
...
}
We should be able to fetch a given employee from the database by using a call like:
Employee bob = Employee.findByName("Bob Martin");
We should also be able to modify that employee and save it as follows:
bob.setName("Robert C. Martin");
bob.save();
In short, every column of the Employee table becomes a field of the Employee class. There are static methods (or some magical reflection) on the ActiveRecord class that allow you to find instances. There are also methods that provide CRUD functions.
Even shorter: There is a 1:1 correspondence between tables and classes, columns and fields. (Or very nearly so).
It is this 1:1 correspondence that bothers me. Indeed, it bothers me about all ORM tools. Why? Because this mapping presumes that tables and objects are isomorphic.
The Difference between Objects and Data Structures
From the beginning of OO we learned that the data in an object should be hidden, and the public interface should be methods. In other words: objects export behavior, not data. An object has hidden data and exposed behavior.
Data structures, on the other hand, have exposed data, and no behavior. In languages like C++ and C# the struct
keyword is used to describe a data structure with public fields. If there are any methods, they are typically navigational. They don’t contain business rules.
Thus, data structures and objects are diametrically opposed. They are virtual opposites. One exposes behavior and hides data, the other exposes data and has no behavior. But that’s not the only thing that is opposite about them.
Algorithms that deal with objects have the luxury of not needing to know the kind of object they are dealing with. The old example: shape.draw();
makes the point. The caller has no idea what kind of shape is being drawn. Indeed, if I add new types of shapes, the algorithms that call draw()
are not aware of the change, and do not need to be rebuilt, retested, or redeployed. In short, algorithms that employ objects are immune to the addition of new types.
By the same token, if I add new methods to the shape class, then all derivatives of shape must be modified. So objects are not immune to the addition of new functions.
Now consider an algorithm that uses a data structure.
switch(s.type) {
case SQUARE: Shape.drawSquare((Square)s); break;
case CIRCLE: Shape.drawCircle((Circle)s); break;
}
We usually sneer at code like this because it is not OO. But that disparagement might be a bit over-confident. Consider what happens if we add a new set of functions, such as Shape.eraseXXX()
. None of the existing code is effected. Indeed, it does not need to be recompiled, retested, or redeployed. Algorithms that use data structures are immune to the addition of new functions.
By the same token if I add a new type of shape, I must find every algorithm and add the new shape to the corresponding switch statement. So algorithms that employ data structures are not immune to the addition of new types.
Again, note the almost diametrical opposition. Objects and Data structures convey nearly opposite immunities and vulnerabilities.
Good designers uses this opposition to construct systems that are appropriately immune to the various forces that impinge upon them. Those portions of the system that are likely to be subject to new types, should be oriented around objects. On the other hand, any part of the system that is likely to need new functions ought to be oriented around data structures. Indeed, much of good design is about how to mix and match the different vulnerabilities and immunities of the different styles.
Active Record Confusion
The problem I have with Active Record is that it creates confusion about these two very different styles of programming. A database table is a data structure. It has exposed data and no behavior. But an Active Record appears to be an object. It has “hidden” data, and exposed behavior. I put the word “hidden” in quotes because the data is, in fact, not hidden. Almost all ActiveRecord derivatives export the database columns through accessors and mutators. Indeed, the Active Record is meant to be used like a data structure.
On the other hand, many people put business rule methods in their Active Record classes; which makes them appear to be objects. This leads to a dilemma. On which side of the line does the Active Record really fall? Is it an object? Or is it a data structure?
This dilemma is the basis for the oft-cited impedance mismatch between relational databases and object oriented languages. Tables are data structures, not classes. Objects are encapsulated behavior, not database rows.
At this point you might be saying: “So what Uncle Bob? Active Record works great. So what’s the problem if I mix data structures and objects?” Good question.
Missed Opportunity
The problem is that Active Records are data structures. Putting business rule methods in them doesn’t turn them into true objects. In the end, the algorithms that employ Active Records are vulnerable to changes in schema, and changes in type. They are not immune to changes in type, the way algorithms that use objects are.
You can prove this to yourself by realizing how difficult it is to implement an polymorphic hierarchy in a relational database. It’s not impossible of course, but every trick for doing it is a hack. The end result is that few database schemae, and therefore few uses of Active Record, employ the kind of polymorphism that conveys the immunity of changes to type.
So applications built around ActiveRecord are applications built around data structures. And applications that are built around data structures are procedural—they are not object oriented. The opportunity we miss when we structure our applications around Active Record is the opportunity to use object oriented design.
No, I haven’t gone off the deep end.
I am not recommending against the use of Active Record. As I said in the first part of this blog I think the pattern is very useful. What I am advocating is a separation between the application and Active Record.
Active Record belongs in the layer that separates the database from the application. It makes a very convenient halfway-house between the hard data structures of database tables, and the behavior exposing objects in the application.
Applications should be designed and structured around objects, not data structures. Those objects should expose business behaviors, and hide any vestige of the database. The fact that we have Employee tables in the database, does not mean that we must have Employee classes in the application proper. We may have Active Records that hold Employee rows in the database interface layer, but by the time that information gets to the application, it may be in very different kinds of objects.
Conclusion
So, in the end, I am not against the use of Active Record. I just don’t want Active Record to be the organizing principle of the application. It makes a fine transport mechanism between the database and the application; but I don’t want the application knowing about Active Records. I want the application oriented around objects that expose behavior and hide data. I generally want the application immune to type changes; and I want to structure the application so that new features can be added by adding new types. (See: The Open Closed Principle)
TDD with Acceptance Tests and Unit Tests 101
Test Driven Development is one of the most imperative tenets of Agile Software Development. It is difficult to claim that you are Agile, if you are not writing lots of automated test cases, and writing them before you write the code that makes them pass.
But there are two different kinds of automated tests recommended by the Agile disciplines. Unit tests, which are written by programmers, for programmers, in a programming language. And acceptance tests, which are written by business people (and QA), for business people, in a high level specification language (like FitNesse www.fitnesse.org).
The question is, how should developers treat these two streams of tests? What is the process? Should they write their unit tests and production code first, and then try to get the acceptance tests to pass? Or should they get the acceptance tests to pass and then backfill with unit tests?
And besides, why do we need two streams of tests. Isn’t all that testing awfully reduntant?
It’s true that the two streams of tests test the same things. Indeed, that’s the point. Unit tests are written by programers to ensure that the code does what they intend it to do. Acceptance tests are written by business people (and QA) to make sure the code does what they intend it to do. The two together make sure that the business people and programmers intend the same thing.
Of course there’s also a difference in level. Unit tests reach deep into the code and test independent units. Indeed, programmers must go to great lengths to decouple the components of the system in order to test them independently. Therefore unit tests seldom exercise large integrated chunks of the system.
Acceptance tests, on the other hand, operate on much larger integrated chunks of the system. They typically drive the system from it’s inputs (or a point very close to it’s inputs) and verify operation from it’s outputs (or again, very close to it’s outputs). So, though the acceptance tests may be testing the same things as the unit tests, the execution pathways are very different.
Process
Acceptance tests should be written at the start of each iteration. QA and Business analysts should take the stories chosen during the planning meeting, and turn them into automated acceptance tests written in FitNesse, or Selenium or some other appropriate automation tool.
The first few acceptance tests should arrive within a day of the planning meeting. More should arrive each day thereafter. They should all be complete by the midpoint of the iteration. If they aren’t, then some developers should change hats and help the business people finish writing the acceptance tests.
Using developers in this way is an automatic safety valve. If it happens too often, then we need to add more QA or BA resources. If it never happens, we may need to add more programmers.
Programmers use the acceptance tests as requirements. They read those tests to find out what their stories are really supposed to do.
Programmers start a story by executing the acceptance tests for that story, and noting what fails. Then they write unit tests that force them to write the code that will make some small portion of the acceptance tests pass. They keep running the acceptance tests to see how much of their story is working, and they keep adding unit tests and production code until all the acceptance tests pass.
At the end of the iteration all the acceptance tests (and unit tests) are passing. There is nothing left for QA to do. There is no hand-off to QA to make sure the system does what it is supposed to. The acceptance tests already prove that the system is working.
This does not mean that QA does not put their hands on the keyboards and their eyes on the screen. They do! But they don’t follow manual test scripts! Rather, they perform exploratory testing. They get creative. They do what QA people are really good at—they find new and interesting ways to break the system. They uncover unspecified, or under-specified areas of the system.
ASIDE: Manual testing is immoral. Not only is it high stress, tedious, and error prone; it’s just wrong to turn humans into machines. If you can write a script for a test procedure, then you can write a program to execute that procedure. That program will be cheaper, faster, and more accurate than a human, and will free the human to do what humans to best: create!
So, in short, the business specifies the system with automated acceptance tests. Programmers run those tests to see what unit tests need to be written. The unit tests force them to write production code that passes both tests. In the end, all the tests pass. In the middle of the iteration, QA changes from writing automated tests, to exploratory testing.
TDD for AspectJ Aspects 32
There was a query on the TDD mailing list about how to test drive aspects. Here is an edited version of my reply to that list.
Just as for regular classes, TDD can drive aspects to a better design.
Assume that I’m testing a logging aspect that logs when certain methods are called. Here’s the JUnit 4 test:
package logging;
import static org.junit.Assert.*;
import org.junit.Test;
import app.TestApp;
public class LoggerTest {
@Test
public void FakeLoggerShouldBeCalledForAllMethodsOnTestClasses() {
String message = "hello!";
new TestApp().doFirst(message);
assertTrue(FakeLogger.messageReceived().contains(message));
String message2 = "World!";
new TestApp().doSecond(message, message2);
assertTrue(FakeLogger.messageReceived().contains(message));
}
}
Already, you might guess that FakeLogger
will be a test-only version of something, in this case, my logging aspect. Similarly, TestApp
is a simple class that I’m using only for testing. You might choose to use one or more production classes, though.
package app;
@Watchable
public class TestApp {
public void doFirst(String message) {}
public void doSecond(String message1, String message2) {}
}
and @Watchable
is a marker annotation that allows me to define pointcuts in my logger aspect without fragile coupling to concrete names, etc. You could also use an interface.
package app;
public @interface Watchable {}
I made up @Watchable
as a way of marking classes where the public methods might be of “interest” to particular observers of some kind. It’s analogous to the EJB 3 annotations that mark classes as “persistable” without implying too many details of what that might mean.
Now, the actual logging is divided into an abstract base aspect and a test-only concrete sub-aspect>
package logging;
import org.aspectj.lang.JoinPoint;
import app.Watchable;
abstract public aspect AbstractLogger {
// Limit the scope to the packages and types you care about.
public abstract pointcut scope();
// Define how messages are actually logged.
public abstract void logMessage(String message);
// Notice the coupling is to the @Watchable abstraction.
pointcut watch(Object object):
scope() && call(* (@Watchable *).*(..)) && target(object);
before(Object watchable): watch(watchable) {
logMessage(makeLogMessage(thisJoinPoint));
}
public static String makeLogMessage(JoinPoint joinPoint) {
StringBuffer buff = new StringBuffer();
buff.append(joinPoint.toString()).append(", args = ");
for (Object arg: joinPoint.getArgs())
buff.append(arg.toString()).append(", ");
return buff.toString();
}
}
and
package logging;
public aspect FakeLogger extends AbstractLogger {
// Only match on calls from the unit tests.
public pointcut scope(): within(logging.*Test);
public void logMessage(String message) {
lastMessage = message;
}
static String lastMessage = null;
public static String messageReceived() {
return lastMessage;
}
}
Pointcuts in aspects are like most other dependencies, best avoided ;) ... or at least minimized and based on abstractions, just like associations and inheritance relationships.
So, my test “pressure” drove the design in terms of where I needed abstraction in the Logger aspect: (i) how a message is actually logged and (ii) what classes get “advised” with logging behavior.
Just as for TDD of regular classes, the design ends up with minimized dependencies and flexibility (abstraction) where it’s most useful.
I can now implement the real, concrete logger, which will also be a sub-aspect of AbstractLogger
. It will define the scope()
pointcut to be a larger section of the system and it will send the message to the real logging subsystem.
Why you have time for TDD (but may not know it yet...) 40
Note: Updated 9/30/2007 to improve the graphs and to clarify the content.
A common objection to TDD is this; “We don’t have time to write so many tests. We don’t even have enough time to write features!”
Here’s why people who say this probably already have enough time in the (real) schedule, they just don’t know it yet.
Let’s start with an idealized Scrum-style “burn-down chart” for a fictional project run in a “traditional” way (even though traditional projects don’t use burn-down charts…).
We have time increasing on the x axis and the number of “features” remaining to implement on the y axis (it could also be hours or “story points” remaining). During a project, a nice feature of burn-down charts is that you can extend the line to see where it intersects the x axis, which is a rough indicator of when you’ll actually finish.
The optimistic planners for our fictional project plan to give the software to QA near the end of the project. They expect QA to find nothing serious, so the release will occur soon thereafter on date T0.
Of course, it never works out that way:
The red line is the actual effort for our fictional project. It’s quite natural for the planned list of features to change as the team reacts to market changes, etc.. This is why the line goes up sometimes (in “good” projects, too!). Since this is a “traditional” project, I’m assuming that there are no automated tests that actually prove that a given feature is really done. We’re effectively running “open loop”, without the feedback of tests.
Inevitably, the project goes over budget and th planned QA drop comes late. Then things get ugly. Without our automated unit tests, there are lots of little bugs in the code. Without our automated integration tests, there are problems when the subsystems are run together. Without our acceptance tests, the implemented features don’t quite match the actual requirements for them.
Hence, a chaotic, end-of-project “birthing” period ensues, where QA reports a list of big and small problems, followed by a frantic effort (usually involving weekends…) by the developers to address the problems, followed by another QA drop, followed by…, and so forth.
Finally, out of exhaustion and because everyone else is angry at the painful schedule slip, the team declares “victory” and ships it, at time T1.
We’ve all lived through projects like this one.
Now, if you remember your calculus classes (sorry to bring up painful memories), you will recall that the area under the curve is the total quantity of whatever the curve represents. So, the actual total feature work required for our project corresponds to the area under the red line, while the planned work corresponds to the area under the black line. So, we really did have more time than we originally thought.
Now consider a Test-Driven Development (TDD) project [1]:
Here, the blue line is similar to the red line, at least early in the project. Now we have frequent “milestones” where we verify the state of the project with the three kinds of automated tests mentioned above. Each milestone is the end of an iteration (usually 1-4 weeks apart). Not shown are the 5-minute TDD cycles and the feedback from the continuous integration process that does our builds and runs all our tests after every block of commits to version control (many times a day).
The graph suggests that the total amount of effort will be higher than the expected effort without tests, which may be true [2]. However, because of the constant feedback during the whole life of the project, we really know where we actually are at any time. By measuring our progress in this way, we will know early whether or not we can meet the target date with the planned feature set. With early warnings, we can adjust accordingly, either dropping features or moving the target date, with relatively little pain. Whereas, without this feedback, we really don’t know what’s done until something, e.g., the QA process, gives us that feedback. Hence, at time T0, just before the big QA drop, the traditional project has little certainty about what features are really completed.
So, we’ll experience less of the traditional end-of-project chaos, because there will be fewer surprises. Without the feedback from automated tests, QA is find lots of problems, causing the chaotic and painful end-of-project experience. Finding and trying to fix major problems late in the game can even kill a project.
So, TDD converts that unknown schedule time at the end into known time early in the project. You really do have time for automated tests and your tests will make your projects more predictable and less painful at the end.
Note: I appreciate the early comments and questions that helped me clarify this post.
[1] As one commenter remarked, this post doesn’t actually make the case for TDD itself vs. alternative “test-heavy” strategies, but I think it’s pretty clear that TDD is the best of the known test-heavy strategies, as argued elsewhere.
[2] There is some evidence that TDD and pair programming lead to smaller applications, because they help avoid unnecessary features. Also, they provide constant feedback to the team, including the stake holders, on what the feature set should really be and which features are most important to complete first.