The Polyglot Tester 39

Posted by Uncle Bob Sat, 19 Dec 2009 17:22:00 GMT

Behavior Driven Development, and it’s emphasis on the Given / When / Then structure of specification has been well accepted by many parts of the software industry. Tools such as JBehave, Cucumber, GivWenZen, have taken a prominent role, and rightly so. After all, it’s hard to argue with the elegance of simple statements such as:

Given that I am a user named Bob with password xyzzy
When I log in with username Bob and password xyzzy
Then I should see “Welcome Bob, you have logged in”.

Yes, it’s hard to argue with it, but argue I shall…

The BDD style is very pretty. Certainly business people can easily read and write it. Moreover, the BDD style provides a conceptual framework within which automated test specification and be efficiently composed. Better still, the BDD tools provide a powerful parsing mechanism that conveniently translates the natural GWT statements into function calls to be executed as tests.

The problem is that the BDD style is simply not appropriate for all, or even most kinds of tests. Why? Because it’s wordy. Consider:

Given a Juke box that shows 0 credits
When I deposit .25
Then the Juke Box shows 1 credit.

Given a Juke box that shows 0 credits.
When I deposit 1.00
Then the Juke Box shows 5 credits.

Given a Juke Box that shows 0 credits.
When I deposit 5.00
Then the Juke Box shows 30 credits.

How is this better than:

Jukebox Credit Table

Deposit Credits
25 1
1.00 5
5.00 30
  • Which of the two is the more elegant?
  • Which of the two is easier to read?
  • If you were looking for a specification error, in which of the two would you be more likely to find it?

I think the answers are rather obvious.

  • There is value in brevity.
  • There is elegance in sparseness.
  • Elaborate wordy structures are not always the best approach.

OK, I can hear the complaints starting to bubble up already; so hold on. I’m not arguing against BDD. I really like the GWT style. And Yes, I realize that many of the BDD testing tools allow you to compress your tests into tables so that you can avoid the wordiness.

My issue is not with the tools. My issue is with the idea that BDD is the only true way, and that all tests should be expressed in GWT format forever and ever amen.

Is this a strawman? Unfortunately not. In my travels over the last several years, I have seen this attitude showing up over and over again. Someone will write a suite of complex tests using GWT style, without realizing that they could shorten it by a factor of 10 or so by recomposing it into a simple table.

Actually my issue is with the tools. Cucumber, for example, is written around Given, When, Then. If you use Cucumber, or JBehave, or any of the other BDD tools, you think in GWT and find it hard to even express something as simple as the table above.

Consider, for example, the following strange test. It measures how well the juke box selects random songs based upon their popularity. More popular songs should be played more often than less popular songs. You might specify it this way in GWT style.

Given a Juke box with 1 credit
When the user presses “You Decide”
Then the juke box will randomly choose a song based on its ranking.

Now this is a perfectly good statement of intent, and I’d certainly want it to turn green if I ran the test. However, if this turns green, what have we really learned? The test result does not really tell us whether or not the selection algorithm is accurate. The GWT statements above are really just hand-waving statements that loudly say Trust Me without necessarily verifying anything.

Now look at the following tables.

Repeat you decide 10000 times and count results
Average credits should ~=1.3

Song Times played
Stairway to Heaven 880 < _ < 1120
In-a-godda-da-vida 1880 < _ < 2120
Viva La Vida 2880 < _ < 3120
Incense and Peppermint 880 < _ < 1120
Comfortably Numb 2880 < _ < 3120

Perhaps you don’t like the magic numbers. There are ways to deal with that, but they are beyond the scope of this blog. So let’s forget about the magic-ness of the numbers for the moment. When you run this test you see something like this:

Repeat you decide 10000 times and count results
Average credits should 1.295 ~=1.3

Song Times played
Stairway to Heaven 880 < 998 < 1120
In-a-godda-da-vida 1880 < 2007 < 2120
Viva La Vida 2880 < 2950 < 3120
Incense and Peppermint 880 < 1019 < 1120
Comfortably Numb 2880 < 3026 < 3120

It’s hard to argue that the green-ness is hand-waving. Nothing here says: Trust Me. You can see that the selection algorithm is random, and is weighted properly.

Now I’m certain that we could construct a set of GWT statements that captures the above semantics perfectly; but why would we? The simple tables express the intent in a format that is simpler and more accessible than GWT.

Conclusion

As programmers we have already learned that we must be polyglots. If you want to write a rails system you’d better know ruby, haml, css, erb, xml, etc. If you want to write a J2EE system, you’d better know Java, JSP, HTML, XML, CSS, etc.

Testers also need to be polyglots. Writing in a single style such as GWT is not going to cut it. GWT has it’s place, but it’s not sufficient. Other testing languages are also useful. Therefore testing tools need to be polyglot tools.

Comments

Leave a response

  1. Avatar
    BenAlabaster@live.com 21 minutes later:

    While as a programmer I tend to agree that sparse is usually elegant and that with every line of code, I add complexity and room for bugs to creep in so keeping it as sparse as possible makes for elegant and concise code.

    However, from a business standpoint what makes sense to us in its elegance loses meaning in translation. Your table is very elegant and sparse and perhaps wouldn’t be mistranslated, but the fact that it requires interpretation means that there is room for human error.

    Your wordy instructions – like legalese have been written for brevity and removal of ambiguity hence, so long as the reader reads English and can follow logical instructions, there is no room for error in translation.

    So the two sides of the argument are both valid, but don’t hold water when looking at them from the opposing team.

  2. Avatar
    Markus Gärtner 33 minutes later:

    I’d put it that way: - BDD style Given/When/Then is one tool to use in my tool-belt - Table style is one tool to use in my tool-belt - Flow-style is one tool to use in my tool-belt - JUnit is one tool to use in my tool-belt.

    I’m happy that FitNesse/Slim provides a great varietee of styles I can use (despite JUnit style). It’s me that has to know what’s in my tool-belt and decided what to use for the application at hand. But I need to know that not everything is suitable in every circumstance (and that I should refuse to use copy&paste testing).

  3. Avatar
    mike@mikedoel.com 39 minutes later:

    Have you really eliminated GWT in your example? Sure, you don’t use the specific words, but aren’t they implied. Actually, I guess I don’t see the Given implication in your example. And without it, how am I supposed to know if the Times Played specification in your test captures expected behavior (maybe it’s randomly choosing by song length instead of popularity).

    I think the basic point should be rephrased to something like – if you’re using GWT, aim for conciseness.

    Given “Vida La Vida” is twice as popular as “Comfortably Numb” When I repeat you decide 100000 times Then the play count for each song should fall in the range Song Times Played Vida La Vida 5500 – 7500 Comfortably Numb 2500 – 4500

    And FWIW – the average credit thing is a bit confusing in your example. What does it have to do with picking songs based on popularity?

  4. Avatar
    Colin Jones about 1 hour later:

    Great points! The tabular style definitely cuts down on the repetition, and I like it a lot better than the repetitive Given/When/Then examples above.

    I think the Jukebox Credit Table you have is easier to understand, as long as you have the context that the first column is the setup and the second column is the one being tested. I’m not sure I agree that Cucumber, in particular, makes it hard to express something akin to the table examples.

    An analogous Cucumber test might look like this:

    Scenario Outline: jukebox credits
      Given there are 0 credits
      When I deposit 
      Then I should have  credits
    Examples:
      | deposit | credits |
      | 0.25    | 1       |
      | 1.00    | 5       |
      | 5.00    | 30      |

    Granted, it’s a bit wordier (though not as much so as the Given/When/Then examples above), but I also think it’s a bit clearer because the context is all there. That said, all programming is based on conventions, and we don’t want to build everything from the ground up every time, so I can also understand wanting to eliminate unnecessary context from the tests. I guess a lot may depend on who’s reading them?

  5. Avatar
    Colin Jones about 1 hour later:

    Shoot, I screwed up the Textile above. The scenario outline should read

    When I deposit <deposit>

    and

    Then I should have <credits> credits

  6. Avatar
    msuarz about 1 hour later:

    This is represented in cucumber with Scenario Outlines http://wiki.github.com/aslakhellesoy/cucumber/scenario-outlines.

    >

    I have my own implementation in fitnesse that will become a framework. I am keeping an eye on what you do with Given/When/Thens.

    Here is an example of Scenario Outlines in fitnesse http://aprogblog.com:8888/job/zunzun/ws/src/Test/fitnesse/FitNesseRoot/SpecsSuite/UpdateStatusSuite/ShorteningUrls/content.txt

    cheers mike

  7. Avatar
    Johannes Brodwall about 1 hour later:

    Duplication in given/when/then steps is something that I see a lot, too. There’s actually a little used feature in Cucumber to deal with it called Scenario outlines. Your example with scenario outlines:

    Scenario: Purchase credits
      Given a Juke box that shows 0 credits 
      When I deposit <deposit>
      Then the Juke Box shows <credits> credit.
    
    Examples:
      | deposit  | credits   |
      | .25      |  1        |
      | 1.00     |  5        |
      | 5.00     |  3        |
    

    The issue of test quality is important in general. Another problem I would like to see addressed is the 30 column FitNesse tables that people use to “put in all the test data before every test”.

    In general, just as given/when/then style testing tends to get a lot of duplication, table-style testing tends to get a lot of superfluous data that makes tests hard to understand

  8. Avatar
    KevDog about 2 hours later:

    You had me until the last line about polyglot testing tools. Sooner or later, a tool that attempts to do everything will end up doing a poor job on something.

  9. Avatar
    VitaminJeff™ about 5 hours later:

    Woot woot for polyglots! :D

  10. Avatar
    Ted M. Young about 7 hours later:

    This is exactly why I’ve argued against GivWhTh-style testing because English Is Bad For Your Tests. Maybe I’m old-school, but I always thought the problems with huge requirements documents is not necessarily because they’re huge, but because they try to be precise in a language that is, by its nature, ambiguous.

    In the GivWhTh style, who is the “I”? What does it mean that “I Deposit”? How does the Jukebox “show” its credits? Merely by using English, you end up endowing the test with a lot of assumptions. By, instead, using a bit more formality (yes, the Fitnesse table format is a formality), you imply less and therefore become more precise.

    The other issue, and I’ve got a pending blog entry on it, is the discoverability of the GivWhTh style. How do I, as a test-writer, know that the scenario should be written using “I” rather that “the user”, or 25 cents or $0.25, e.g.:

    Scenario: Purchase credits
    Given a Jukebox ...
    When the customer deposits 25 cents

    or

    Scenario...
    Given...
    When I deposit $0.25
    Then ...

    They’re both proper English. They’re both precise. However, if you only handle the “25 cents” and “I deposit”, you’ve already limited the way the test can be expressed, and the test writer will find out that they didn’t express their intent in the proper way when the test fails. If you want to allow more flexibility in the input, you’d have to write more test scaffolding. By implying that you accept any English, you’ve made the test that much harder to write since the choices of words are wide open.

  11. Avatar
    Andras Hatvani about 18 hours later:

    Every developer writing clean code surely also wants to write clean tests/specifications. Therefore, the developers of BDD tools most probably want to provide tools best supporting the writing of clean tests. This is the case with JBehave as well, since you can formulate table examples the same style as with Cucumber; see http://jbehave.org/reference/stable/table-examples.html. Moreover, I’m sure that the developers of such tools also welcome improvement suggestions, which help them and the users of frameworks to write unambiguous, concise, and for everyone understandable tests.

    @Uncle Bob: As the lack of tables are by no means a problem, since both, Cucumber and JBehave support tables, what other kind of behavior specification style/language/syntax would you still add?

  12. Avatar
    Mark Nijhof 1 day later:

    Kent Beck said in his interview with Industry Misinterpretations #164 that you should also be DRY in your tests, meaning that if you can verify some behavior in one test versus two tests then you should use one. I guess this somewhat follows that path as well.

    Link: http://www.cincomsmalltalk.com/blog/blogView?showComments=true&;printTitle=Industry_Misinterpretations_164:_Going_for_the_Longball&entry=3436948975

  13. Avatar
    Steve Py 1 day later:

    BDD is no more the “only true way” than TDD is. The GWT syntax of the test is a means of expressing the requirement and the test by the desired behaviour.

    The significance of this is that the test now directly correlates to a business requirement and can take over the role of a requirement in some vague document, or a story card sitting on some white-board.

    Personally, I don’t use the GWT syntax, but I definitely grok the rationale behind expressing tests in this manner.

    BDD tests are not just a more verbose version of a unit test; a unit test is geared at testing a unit of work where the BDD test is an expression of the satisfaction of the underlying requirement.

    A better example of a BDD test might be for a vending machine: Given I select a $1.50 candy bar and the machine has sufficient change, when I insert $2.00 and select the candy bar, then I should receive the candy bar and $0.50 change. [Test] EnsureWhenSufficientChangeAndSufficientFundsProductIsProvidedAndChangeGiven()

    Given I want to buy a $1.50 candy bar and the machine does not have sufficient change, when I insert $2.00 and select the candy bar, then I should receive my $2.00 back with a message to use exact change. [Test] EnsureWhenInsufficientChangeAndSufficientFundsProductIsNotProvidedCustomerNotifiedAndMoneyRefunded()

    etc.

    Granted by doctrine that giving product, giving change, giving refund, etc. can be considered their own behaviours and written as separate tests.

    My $0.02 change.

  14. Avatar
    Liz Keogh 4 days later:

    “Therefore testing tools need to be polyglot tools.”

    JBehave and RBehave (which evolved to become Cucumber) were originally designed to elicit conversation and help developers learn more about the domain and the requirements. I’d call them learning tools first, testing tools second.

    BDD was founded on the confusion caused by the word “test”, and I see more of that confusion here. Nobody I know in the BDD movement suggests that G/W/T is the only way to test, or even to capture requirements. Wordy English is, however, a great way to learn.

  15. Avatar
    Real estate in india 5 days later:

    Great describe. I really enjoy your experience.I am Agree with ” The tabular style definitely cuts down on the repetition, and I like it a lot better than the repetitive Given/When/Then examples above.”

  16. Avatar
    dungeon fighter gold 6 days later:

    Personally, I don’t use the GWT syntax, but I definitely grok the rationale behind expressing tests in this manner.

  17. Avatar
    Voyance paris 9 days later:

    very important.

  18. Avatar
    tarot 9 days later:

    this article is very interesting.

  19. Avatar
    dialogue sexe 9 days later:

    good luck.

  20. Avatar
    ittay 11 days later:

    GWT is to verify your requirements are meant. They are meant to be seen by non-developer stakeholders. In the random selection example, if there’s no requirement on distribution, then it’s fine not to verify it. If there is a requirement on distribution, then a GWT can be written (Given that you decide is selected 100 times…).

    GWT is no more than an english-to-computer_language bridge. When implementing the GWT statements, you can choose whatever tool suites you.

  21. Avatar
    tester_guy 16 days later:

    I think your jukebox example is too rudimentary. When you are dealing with complex systems the added benefit of using words is helpful for both readability and collaboration.

  22. Avatar
    derek smyth 17 days later:

    Hi, I’m reasonably new to BDD and I luv how it’s about doing TDD correctly (the outer acceptance test loop and the inner unit testing / class specification loops)

    I’m not really in a position to say whether GWT limits thinking. At the moment for me it doesn’t; it actually opens my mind up a bit to new possibilities. I also know that it will help the adoption of agile like development to a number of non-developers in the place I work; the GWT being very similar to user stories.

    I take on board what Uncle Bobs says though and keep in mind.

    What I would like to comment on is the polygot tester.

    Unfortunately not everyone who tests (or codes) on a project is a trained software developer / tester (hey it’s easy!). I know from experience that introducing multiple technologies that do the same thing differently can meet with a lot of resistance.

    If I told my team that we were going to use both Cucumber and Fitnesse to do acceptance testing on the next project I know I would be asked ‘why use both?’ and ‘when should I use one over the other?’

    I’m all for being a polygot but not everyone is… the principle of least effort.

  23. Avatar
    legal document 27 days later:

    Object may refer to, Object (philosophy), a thing, being or concept. Entity , something that is tangible and within the grasp of the senses

  24. Avatar
    legal documents 30 days later:

    He was chief scientist of the Embedded Systems Division at Mentor … He has contributed to the Object Management Group

  25. Avatar
    François about 1 month later:

    GreenPepper supports both multiple table (ie business rules, collections, etc.) type including a BDD style scenario interpreter (http://www.greenpeppersoftware.com/confluence/display/GPWODOC/5.+Scenario+Fixture) giving you the flexibility to use GWT or not.

  26. Avatar
    Endep 2 months later:

    Interesting analysis

  27. Avatar
    Matt 3 months later:

    Thanks for sharing this great article! That is very interesting Smile I love reading and I am always searching for informative information like this.

  28. Avatar
    annie2278 3 months later:

    Therefore testing tools need to be polyglot tools.

  29. Avatar
    card 3 months later:

    Yes, you’re right. We must be polyglots!

  30. Avatar
    secured 3 months later:

    Testing tools are important.

  31. Avatar
    disney restaurants 3 months later:

    Your table is very elegant and sparse and perhaps wouldn’t be mistranslated, but the fact that it requires interpretation means that there is room for human error. I’m sure that the developers of such tools also welcome improvement suggestions, which help them and the users of frameworks to write unambiguous, concise, and for everyone understandable tests.

    The analysis are very interesting

  32. Avatar
    han 4 months later:

    Cucumber, GivWenZen, have taken a prominent role, and rightly so. After all, it’s hard to argue with the elegance of simple statements such as:

  33. Avatar
    parça TL kontör 4 months later:

    Very nice art, thank you for this site!

  34. Avatar
    FLV extractor 4 months later:

    come to have a look

  35. Avatar
    Blu-ray ripper mac 4 months later:

    Free download Blu-ray to iPad Mac, you can easily convert Blu ray and DVDs to iPad for playing.

  36. Avatar
    http://www.louboutinshoesmall.com 5 months later:
  37. Avatar
    Carter Eduardo 5 months later:

    I think it’s a coincidence while is true that some words can reach a fad;; they may be used by politicians or media due to some events or it’s just wooden language. rehearsal dinner

  38. Avatar
    dentist acton 7 months later:

    Nice to be visiting your blog again, it has been months for me. Well this article that i’ve been waited for so long.

  39. Avatar
    Youtube converter 7 months later:

    Really awesome article

Comments