Specs vs. Tests

Specs vs. Tests 49

Posted by Uncle Bob Thu, 01 Feb 2007 15:32:07 GMT

There’s something to this BDD kool-aid that people have been drinking lately…

As part of the Rails project I’ve been working on for the last few weeks, I’ve been using RSpec. RSpec is a unit testing tool similar in spirit to JUnit or Test/Unit. However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.

In Java, using JUnit, we might write the following unit test:

public class BowlingGameTest extends TestCase {
  private Game g;

  protected void setUp() throws Exception {
    g = new Game();
  }

  private void rollMany(int n, int pins) {
    for (int i=0; i< n; i++) {
      g.roll(pins);
    }
  }

  public void testGutterGame() throws Exception {
    rollMany(20, 0);
    assertEquals(0, g.score());
    assertTrue(g.isComplete());
  }

  public void testAllOnes() throws Exception {
    rollMany(20,1);
    assertEquals(20, g.score());
    assertTrue(g.isComplete());
  }
}

This is pretty typical for a Java unit test. The setup function builds the Game object, and then the various test functions make sure that it works in each different scenario. In Ruby however, this might be expressed using RSpec as:

require 'rubygems'
require_gem "rspec" 
require 'game'

context "When a gutter game is rolled" do
  setup do
    @g = Game.new
    20.times {@g.roll 0}
  end

  specify "score should be zero" do
    @g.score.should == 0
  end

  specify "game should be complete" do
    @g.complete?.should_be true
  end
end

context "When all ones are rolled" do
  setup do
    @g = Game.new
    20.times{@g.roll 1}
  end

  specify "score should be 20" do
    @g.score.should == 20
  end

  specify "game should be complete" do
    @g.complete?.should_be true
  end
end

At first blush the difference seems small. Indeed, the RSpec code might seem too verbose and fine-grained. At least that was my first impression when I first saw RSpec. However, having used it now for several months I have a different reaction.

First, let’s looks a the semantic differences. In JUnit you have TestCase derivatives, and test functions. Each TestCase derivative has a setUp and tearDown function, and a suite of test functions. In RSpec you have what appears to be an extra layer. You have the test script, which is composed of context blocks. The contexts have setup, teardown, and specify blocks.

At first you might think that the RSpec context block coresponds to the Java TestCase derivative since they are semantically equivalent. However Java throws something of a curve at us by only allowing one public class per file. So from an organizational point of view there is a stronger equivalence between the TestCase derivative and the whole RSpec test script.

This might seem petty. After all, I can write Java code that is semantically equivalent to the RSpec code simply by creating two TestCase derivatives in two different files. But separating those two test cases into two different files makes a big difference to me. It breaks apart things that otherwise want to stay together.

Now it’s true that I could keep the TestCase derivatives in the same file by making them package scope, and manually put them into a public TestSuite class. But who wants to do that? After all, my IDE is nice enough to find and execute all the public TestCase derivatives, which completely eliminates the need for me to build suites—at least at first.

Note: The JDave tool provides BDD syntax for Java.

Again, this might seem petty; and if that were the only benefit to the RSpec syntax I would agree. But it’s not the only benefit.

Strange though it may seem, the next benefit is the strings that describe the context and specify blocks. At first I thought these strings were just noise, like the strings in the JUnit assert functions. I seldom, if ever, use the JUnit assert strings, so why would I use the context and specify strings? But over the last few weeks I have come to find that, unlike the JUnit assert strings, the RSpec strings put a subtle force on me to create better test designs.

Stable State: An Emergent Rule.

When a spec fails, the message that gets printed is the concatenation of the context string and the specify string. For example: 'When a gutter game is rolled game should be complete' FAILED. If you word the context and specify strings properly, these error message make nice sentences. Since, in TDD, we almost always start out with our tests failing, I see these error message a lot. So there is a pressure on me to word them well.

But by wording them well, I am constrained to obey a rule that JUnit never put pressure on me to obey. Indeed, I didn’t know it was a rule until I started using RSpec. I call this rule Stable State, it is:

Tests don’t change the state.

In other words, the functions that make assertions about the state of the system, do not also change the state of the system. The state of the system is set up once in the setUp function, and then only interrogated by the test functions.

If you look carefully at the specification of the Bowling Game you will see that the state of the Game is changed only by the setup block within the context blocks. The specify blocks simply interrogate and verify state. This is in stark contrast to the JUnit tests in which the test methods both change and verify the state of the Game.

If you don’t follow this rule it is hard to get the strings on the context and specify blocks to create error messages that read well. On the other hand, if you make sure that the specify blocks don’t change the state, then you can find simple sentences that describe each context and specify block. And so the subtle pressure of the strings has a significant impact on the structure of the tests.

I can’t claim to have discovered the pressure of these strings. Indeed, Dan North’s original article on the topic is captivating. However, I felt the pressure and came to the same conclusion he did, well before I read his article; simply by using a tool inspired by his work.

The benefit of Stable State is that for each set of assertions there is one, and only one place where the state of the system is changed. Moreover the three level structure provides natural places for groups of state, states, and asserts.

The demise of the One Assert rule.

There have been other rules like this before. One that circulated a few years back was:

One assert per test.

I never bought into this rule, and I still don’t. It seems arbitrary and inefficient. Why should I put each assert statement into it’s own test method when I can just as well put the assert statement into a single test method.

In other words, why prefer this:

  public void testGutterGameScoreIsZero() throws Exception {
    rollMany(20, 0);
    assertEquals(0, g.score());
  }

  public void testGutterGameIsComplete() throws Exception {
    rollMany(20, 0);
    assertTrue(g.isComplete());
  }

over this:

  public void testGutterGame() throws Exception {
    rollMany(20, 0);
    assertEquals(0, g.score());
    assertTrue(g.isComplete());
  }

I think the authors of the One Assert rule were trying to achieve the benefits of Stable State, but missed the mark. It’s as though they could smell the rule out there, but couldn’t quite pinpoint it.

The State Machine metaphor

When you follow the Stable State rule your specifications (tests) become a description of a Finite State Machine. Each context block describes how to drive the SUT to a given state, and then the specify blocks describe the attributes of that state.

Dan North calls this the Given-When-Then metaphor. Consider the following triplet:

Given a Bowling Game: When 20 gutter balls are rolled, Then the score should be zero and the game should be complete.

This triplet corresponds nicely to a row in a state transition table. Consider, for example, the subway turnstile state machine:

Current State	Event	New State
Locked	coin	Unlocked
Unlocked	pass	Locked
Locked	pass	Alarm
Unlocked	coin	Unlocked

We can read this as follows:

GIVEN we are in the Locked state, WHEN we get a coin event, THEN we should be in the Unlocked state. — GIVEN we are in the Unlocked state, WHEN we get a pass event, THEN we should be in the Locked state. — etc.

Describing a system as a finite state machine has certain benefits.

We can enumerate the states and the events, and then make sure that every combination of state and event is handled properly.
We can formalize the behavior of the system into a well known tabular format that can be read and interpreted by machines.
- I am, of course, thinking about FitNesse
There are well known mechanisms for implementing finite state machines.

The point is that organizing the system description in terms of a finite state machine can have a profound impact on the system design and implementation.

The Butterfly Effect.

I find it remarkable that two dumb annoying little strings put a subtle pressure on me to adjust the style of my tests. That change in style eventually caused me to see the design and implementation of the system I was writing in a very new and interesting light.

Posted in Uncle Bob's Blatherings
Meta no trackbacks, 49 comments, permalink, rss, atom

Trackbacks

Use the following link to trackback from your own site:
http://blog.objectmentor.com/articles/trackback/110

Comments

Leave a response

Chris Hedgate about 16 hours later:

I really like the Stable State idea, and I think you are probably right about One Assert Per Test really was about this but did not quite get there. However, if you are using BDD and following Stable State, does not that more or less make One Assert Per Test a given as well? In your example above you have several specify, each with a single something.should specification. If they would all be slumped together in one specify, writing the string for that would again be impossible. And I think TDD can work the same way, which is why I have always tried to abide to One Assert Per Test. I recently wrote about this in One Assert Per Test should come natural.
Michael Feathers about 2 hours later:

One nit I have with BDD style is the fact that the test comment string so closely reflects the code in the case. It feels like duplication.

When you’ve worked to be able to say @g.score.should == 0, it feels weird to have to write “score should be zero.” Granted, you write the string before the the code, but it still looks odd after the fact. Makes you wonder whether a framework written in fluent style could generate the string.

Chris Hedgate 20 minutes later:

Michael: I do not know about RSpec (have just tried it once), but Specter (a similar framework for .Net) does that. Here is an example (Specter specs are written in Boo):

context "When input is a phrase marked with asterisks":
  output as string

  setup:
    transformer = ContextileTransformer()
    output = transformer.ToHtml("*foo*")

  specify output.Must.Not.BeEmpty()

  specify "Phrase is given strength in output":
    output.Must.Equal("<p><strong>foo</strong></p>")

When this is run in a test runner the output is something like:

WhenInputIsAPhraseMarkedWithAsterisks
  outputMustNotBeEmpty
  PhraseIsGivenStrength

David Chelimsky 4 minutes later:
Chris – that looks pretty interesting.

Michael – while the duplication that you are describing is easy to end up with, there really is plenty of flexibility in what you write. You could, for example, say:
```
A bowling game
- should score an all gutter game correctly
- should score an all ones game correctly
- should score an all spares game correctly
- should score an all strikes game correctly
```
or
```
A bowling game should produce the correct score
- given an all gutter game
- etc
```
One thing we’re working on is some means of nesting contexts and/or specifications. So you could do something like this:
```
Bowling game behaviour
- should score correctly
  - when game is all gutters
  - when game is all ones
  - when game is all spares
  - when game is all strikes
- should consider the game complete
  - after 20 0s
  - after 20 1s
  - after 21 5s
  - after 12 Xs
```
Not saying these are “right”, just that they become possible. Note how the last example begins to feel more like a description of behaviour – not just because of the word, but because of the nesting structure. An x should do y under conditions a,b,c and d.

Food for thought.
David Chelimsky 1 minute later:

More thoughts – the process through which specs evolve is going to have an impact on what goes in the name and what goes in the code. “should score 0” might have come from a customer who said “this is what the score should be for an all gutter game”. That name then serves as definition up front, documentation later.
Tobias Grimm 20 minutes later:

Recently I started using RSpec the first time. Beeing used to Assert.*, I’ve always been sceptical about the “pseudo-natural-languge-style” expressions. I still don’t see that much of a difference (from a programmers point of view) whether I write “Assert.Equals(expected, actual)” or “actual.should == expected”.

But what I absolutely love about RSpec, is the way it makes me think about what I want to code. In the “normal” TDD-way I’ve always been kinda more focused on the design of my class currently under test, making me loose focus on what I really wanted to do. BDD forces me to focus more on application behaviour and helping me to stay on track. And just as you, I think, this is mainly because of the context/specification-style of writing tests.

With TDD the test method names often technically describe the class under test and how it is to be used. In BDD style the test methods (=specifications) describe what this part of my code is good for after all. The technical stuff then moves to the body of the specification.

I’ve now even started to write my NUnit-tests in BDD-style, which works pretty well. Each TestFixture is a context (defined in SetUp) and each Test is a specification. Of course this doesn’t give me the nice failure messages that RSpec produces, but it seems to work too. I think David Chelimsky somwhere said something like “BDD is doing TDD the right way.”.

Regarding the “Stable State” thing – I try to follow this rule, but sometimes it just doesn’t seem to fit and I break it. Maybe a sign, that I haven’t fully adopted the BDD-style yet.
YAChris about 2 hours later:
Hmmmmmmmmmmm…

I’m with you on the “One Assert rule”. Recently, I’ve been working on some code which is state-based, so I wind up with:
```
State state = State.START;
state = stateMachine.step(state, 'Foo');
assertEquals(state, State.PAST_FOO);
state = stateMachine.step(state, 'Bar');
assertEquals(state, State.SKIP);
state = stateMachine.step(state, 'Baz');
assertEquals(state, State.PAST_FOO_AND_BAZ);
```
where it’s necessary to have changing state, because the interesting bit is that there can be many SKIP situations interleaved, BUT we have to remember where we were before the SKIP, to wind up in the right state at the end.

So, the question is, in Specs-land would this have to become three complete Contexts? That seems unfortunate. Maybe there is a better way to do it?
Michael Feathers about 3 hours later:

State is a slippery thing. Uncle Bob’s bowling game, as presented, is stateful, but imagine a different problem: you need to create a command object which accepts an array of throws and returns the score. The object doesn’t have mutable state, so technically no spec would alter state, but without a need for setup you could easily end up without all of the contexts that Bob found.

Seems that the benefit that BDD provides in this context lessens to the degree that you move toward less stateful objects, and that seems to happen among people who do a lot of interaction-style TDD.
Paul Holser 1 day later:

I think the intent behind the “One Assert Per Test” rule was to get you thinking of “TestCases” as “fixtures” instead - wherein setUp() puts the system or a slice thereof in a specific state, and the testX() methods each contain one assertion about the state of the system that should hold if the system is working correctly. So in the first code listing under “The Demise of the ‘One-Assert’ Rule”, you’d factor out the multiple rollMany(20,0) calls into a setUp() - just like the RSpec specification for the gutter game does. It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.

It’s interesting to see that RSpec facilitates thinking about aspects of the system being developed in terms of those system states, and not so much a one-spec-per-class mindset.
Paul Holser 1 day later:

I think the intent behind the “One Assert Per Test” rule was to get you thinking of “TestCases” as “fixtures” instead - wherein setUp() puts the system or a slice thereof in a specific state, and the testX() methods each contain one assertion about the state of the system that should hold if the system is working correctly. So in the first code listing under “The Demise of the ‘One-Assert’ Rule”, you’d factor out the multiple rollMany(20,0) calls into a setUp() - just like the RSpec specification for the gutter game does. It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.

It’s interesting to see that RSpec facilitates thinking about aspects of the system being developed in terms of those system states, and not so much a one-spec-per-class mindset.
Steven Baker 3 days later:

Michael: RSpec includes a great mocking and stubbing framework (derived from SchMock, but now more closely resembling Mocha).

Myself, and many of the others working with RSpec hardly use state-based specifications at all. I work almost exclusively with mocks in RSpec in most of the projects I use it.
Jason Gorman 7 days later:

Uncle Bob: In an interview I did with you for the (now defunct) objectmonkey.com site, didn’t you tell me that you didn’t care if TDD was like formal specification? Have you changed your mind since then?
Paul Davis 7 days later:

After reading this article, I’m wondering…. Could we use jRuby to write RSpec tests against java code? Damn Uncle Bob, I’m going to lose another weekend because of you. ;-)
Uncle Bob 8 days later:

rspec in jRuby… now that’s an interesting thought…
Uncle Bob 8 days later:

Jason, You’ll have to refresh my memory about that interview and the context of it. I’ve been making the “formal document” argument for at least five years.
Jason Gorman 13 days later:

Alas, I don’t have the original manuscript to hand, but from memory I think I asked you if TDD was formal specification by the back door, and I distinctly recall you saying you didn’t care if it was. The title of the interview was “Getting Sh*t Done”, if that helps establish the context :-)
Jason Gorman 13 days later:

That’s not to say I don’ totally agree with what you’re saying now. I think any movement towards higher integrity specs – executable specs – is progress. It all sounds very familiar to me – I think I’ve been doing BDD right from the get go since I started doing TDD – so I’m bound to draw the comparison now.
Jason Gorman 13 days later:

Courtesy of the WaybackWhen web cache (interview from 2003, I think):

ObjectMonkey: Here’s a hot potato for you – is Test-driven Development really Formal Methods in disguise?

Uncle Bob: Test-driven development is the most profound and auspicious thing to happen to the software industry since I’ve been a programmer. I think it’s even more important than OO.

ObjectMonkey: I’m inclined to agree.

Uncle Bob: Nothing has had such a profound effect upon the way I write code than TDD. Nothing. When I write code now, I run tests every few minutes. My stuff is always working. I never have windows all over my screen with modules torn apart, hoping I can one day piece them back together again. Every minute or two I run tests, and get my stuff working. I don’t use debuggers anymore. Debuggers are a drug. You get addicted to them. They drag you down a rat hole. You spin and spin, trying to set your breakpoints, trying to follow the logic, trying to figure out what the hell is going on. With TDD, that all but goes away. I haven’t used a debugger in anger in over three years. And I chide anyone I see who is using one. So I don’t care whether there is a link between TDD and FM. TDD is a great boon to me, and to software in general.ï¿½
Pandora over 4 years later:

However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.
iPad video converter for Mac over 4 years later:

When I come to here, I think I am in the right place. the web gives me a lot of infomation, it is very informative. I think lots of people can learn much here. I will come to here again. Thanks.
puma over 4 years later:

If you mean to find great shoes for your children puma speed trainers also provides a mixture of finicky and affordable shoes for them. There are a lot of choices, it is up ring call,Ugg Boots, after by people that indigence an incredible quantity of column. This will make the customers happier. If you are often tangled in Singapore womens puma future cat shoes sale at Sainte Marie that could enhance operational efficiency, range visibility and turnaround time,” said Desmond Chan, managing boss, South Asia, Menlo Worldwide Logistics. “Our multi-client facility in Boon Lay Way provides puma trainers with different flag. puma uk’s youngest targets are toddlers. The puma for sale shoes are incredibly affordable, yet they still hold the grace. Wearing comfortable shoes will help children exploit better.
hermes replica watches over 4 years later:

no longer visits the fake hermes bracelet the communal spaces.”I’m sure there are some replica hermes some nice people here, but they have hermes neck scarf have 13 or
Criminal Check over 4 years later:

So from an organizational point of view there is a stronger equivalence between the TestCase derivative and the whole RSpec test script.
Criminal Check over 4 years later:

In other words, the functions that make assertions about the state of the system, do not also change the state of the system.
Criminal Records over 4 years later:

It may very well be that the “tests” for a particular class, then, get spread out across many fixtures, instead of feeling like you have to place all tests for a given class in the same TestCase derivative. The fixture names corresponding to system states, not classes. At least that’s how I’ve read it.
Tenant Screening over 4 years later:

Test-driven development is the most profound and auspicious thing to happen to the software industry since I’ve been a programmer. I think it’s even more important than OO.
cable ties over 4 years later:

Great information in here~
cable ties over 4 years later:

Great discussion in here.
SEO Firm India over 4 years later:

enjoyed reading it. I need to read more on this topic…I admiring time and effort you put in your blog, because it is obviously one great place where I can find lot of useful info..
Hotel Bucuresti over 4 years later:

Normally, the tests must reveal the same values as the specs said. However, the reality seems to be very difficult to understand because sometimes the differences are very big.
True Religion Jeans Outlet Online over 4 years later:

it is a useful and wonderful website.thanks for your information.
Nike Sneakers Outlet over 4 years later:

Thank you very good and a healthy writing. I’ll definitely keep track of posts and the occasional visit. Looking forward to reading your next publish.Nike Sneakers Outlet
okey oyunu oyna over 4 years later:

thanks for this post :)

Dünyan?n en büyük online okey oyunu bu sitede sizleri bekliyor. Gerçek ki?ilerle sohbet ederek Okey Oyunu Oyna ve internette online oyun oynaman?n zevkini ç?kar
funny pictures over 4 years later:

Yes these are very important indeed
kohls printable coupons over 4 years later:

Thanks for the guide, it’s very helpful.
Crescent Processing Company over 5 years later:

Crescent Processing Company You deserve the best and I know this will just add to your very proud accomplishments in your already beautiful and deserving blessed life. I wish you all the best and again. Thanks a lot.. Crescent Processing Company
Diablo3 over 5 years later:

Blog posts about wedding and bridal are always rare to find , at least with great quality,you qualify for a great blog post writer title,kep the great job happening
canada goose coat over 5 years later:

Canada Goose Outlet is Marmot 8000M Parka. The Marmot 8000M Parka is really a waterproof, breathable jacket with 800 fill canada goose jacket feathers. It truly is design and light colored shell is produced for trendy, but uncomplicated, protection from cold temperatures. Reinforced shoulders, elbows and adjustable waist and hem make the Marmot a perfect alternate for skiing and other outdoor sports that want fairly a bit of arm motion. The 8000M Parka weighs three lbs., comes in bonfire and black colours and might be stuffed and stored like a sleeping bag to your convenience.This is one of well-know and prime down jacket brands.Hope our friends like its!Like canada goose womens and Canada Goose Expedition Parka.There are wholesale canada goose.
renlewei over 5 years later:

Gucci Top Handles Gucci Shoulder Bags Gucci Clutches http://www.saleguccinewbags.com/gucci-boston-bags-c-58.html">Gucci Boston Bags Gucci Messenger Bags authentic discount gucci bags
lewis over 5 years later:

Nice pearl at http://www.cnwpearl.com http://www.cnwpearl.com/freshwater-pearl-necklace/c1/index.html http://www.cnwpearl.com/freshwater-pearl-bracelets/c7/index.html
LiMelia over 5 years later:

thanks for ur sharing, I like your blog, content is very rich, allow me to leave a message well, wish you are lucky!!!!! http://www.junyuetrade.com/
Wound Bandages over 5 years later:

At first I would congratulate you on writing such a brilliant piece of write-up. You have got some exceptional writing skills that have made your site worth reading.
Harris over 5 years later:

I really like the Stable State idea. a thought for the day
Learn Spanish Language FREE over 5 years later:

thanks for ur sharing, I like your blog, content is very rich. http://www.youtube.com/watch?v=AYNtk_LMrho
cheap essay writing service over 5 years later:

I feel strongly about this and I take pleasure in learning about this topic. If possible, as you gain data, please add to this blog with more information…
play bloons tower defense 5 over 5 years later:

Some genuinely choice content on this internet site, saved to my bookmarks.
iphone sms to mac backup over 5 years later:

Most of us will delete the SMS file if the iPhone inbox is full. For some of the very important text file, you would like to save it to Mac hard drive and read it later or you need print them. So, why not export the text message to HDD and save it now?
louboutin sales over 5 years later:

ost of us will delete the SMS file
louboutin sales over 5 years later:

Specs vs. Tests 48 hoo,good article!!I like the post!6

Mentor	twitter id
Uncle Bob	unclebobmartin
Brett Schuchert	schuchert
Michael Feathers	mfeathers
Bob Koss	bob_koss

Specs vs. Tests 49

Stable State: An Emergent Rule.

The demise of the One Assert rule.

The State Machine metaphor

The Butterfly Effect.

Trackbacks

Comments

Blog Search

Follow us on twitter

Categories

Blogroll

Syndicate

Atlanta

Danville

Canonsburg

Port-en-Bessin-Huppain

Candolim

Countryside

Verona

San Sebastián

Bolingbrook

Sister Bay

Baker

Flint

Baltimore

Sacramento

Port Arthur

College Station

Huntington

Freeport

Crystal River

Scottsdale

Hershey

Palm Coast

Rosemont

Fresno

Addison

Pittsburgh

Detroit

Greensboro

Indianapolis

Honolulu

Troy

Bozeman

Carlstadt

Tempe

Bailly-Romainvilliers

Richmond

Manchester

Wilmot

New York

Green Bay

Media

Boston

Atlantic City

Kennewick

Oconomowoc

Hilton Head Island

Kingsport

Westlake Village

Smyrna

Shanghai

Lagos

Hinesville