Generated Tests and TDD 70

Posted by Uncle Bob Thu, 10 Jan 2008 19:59:30 GMT

TDD has become quite popular, and many companies are attempting to adopt it. However, some folks worry that it takes a long time to write all those unit tests and are looking to test-generation tools as a way to decrease that burden.

The burden is not insignificant. FitNesse, an application created using TDD, is comprised of 45,000 lines of Java code, 15,000 of which are unit tests. Simple math suggests that TDD increases the coding burden by a full third!

Of course this is a naive analysis. The benefits of using TDD are significant, and far outweigh the burden of writing the extra code. But that 33% still feels “extra” and tempts people to find ways to shrink it without losing any of the benefits.

Test Generators.

Some folks have put their hope in tools that automatically generate tests by inspecting code. These tools are very clever. They generate random calls to methods and remember the results. They can automatically build mocks and stubs to break the dependencies between modules. They use remarkably clever algorithms to choose their random test data. They even provide ways for programmers to write plugins that adjust those algorithms to be a better fit for their applications.

The end result of running such a tool is a set of observations. The tool observes how the instance variables of a class change when calls are made to its methods with certain arguments. It notes the return values, changes to instance variables, and outgoing calls to stubs and mocks. And it presents these observations to the user.

The user must look through these observations and determine which are correct, which are irrelevant, and which are bugs. Once the bugs are fixed, these observations can be checked over and over again by re-running the tests. This is very similar to the record-playback model used by GUI testers. Once you have registered all the correct observations, you can play the tests back and make sure those observations are still being observed.

Some of the tools will even write the observations as JUnit tests, so that you can run them as a standard test suite. Just like TDD, right? Well, not so fast…

Make no mistake, tools like this can be very useful. If you have a wad of untested legacy code, then generating a suite of JUnit tests that verifies some portion of the behavior of that code can be a great boon!

The Periphery Problem

On the other hand, no matter how clever the test generator is, the tests it generates will always be more naive than the tests that a human can write. As a simple example of this, I have tried to generate tests for the bowling game program using two of the better known test generation tools. The interface to the Bowling Game looks like this:

  public class BowlingGame {
    public void roll(int pins) {...}
    public int score() {...}
  }
The idea is that you call roll each time the balls gets rolled, and you call score at the end of the game to get the score for that game.

The test generators could not randomly generate valid games. It’s not hard to see why. A valid game is a sequence of between 12 and 21 rolls, all of which must be integers between 0 and 10. What’s more, within a given frame, the sum of rolls cannot exceed 10. These constraints are just too tight for a random generator to achieve within the current age of the universe.

I could have written a plugin that guided the generator to create valid games; but such an algorithm would embody much of the logic of the BowlingGame itself, so it’s not clear that the economics are advantageous.

To generalize this, the test generators have trouble getting inside algorithms that have any kind of protocol, calling sequence, or state semantics. They can generate tests around the periphery of the classes; but can’t get into the guts without help.

TDD?

The real question is whether or not such generated tests help you with Test Driven Development. TDD is the act of using tests as a way to drive the development of the system. You write unit test code first, and then you write the application code that makes that code pass. Clearly generating tests from existing code violates that simple rule. So in some philosophical sense, using test generators is not TDD. But who cares so long as the tests get written, right? Well, hang on…

One of the reasons that TDD works so well is that it is similar to the accounting practice of dual entry bookkeeping. Accountants make every entry twice; once on the credit side, and once on the debit side. These two entries follow separate mathematical pathways. In the end a magical subtraction yields a zero if all the entries were made correctly.

In TDD, programmers state their intent twice; once in the test code, and again in the production code. These two statements of intent verify each other. The tests, test the intent of the code, and the code tests the intent of the tests. This works because it is a human that makes both entries! The human must state the intent twice, but in two complementary forms. This vastly reduces many kinds of errors; as well as providing significant insight into improved design.

Using a test generator breaks this concept because the generator writes the test using the production code as input. The generated test is not a human restatement, it is an automatic translation. The human states intent only once, and therefore does not gain insights from restatement, nor does the generated test check that the intent of the code was achieved. It is true that the human must verify the observations, but compared to TDD that is a far more passive action, providing far less insight into defects, design and intent.

I conclude from this that automated test generation is neither equivalent to TDD, nor is it a way to make TDD more efficient. What you gain by trying to generate the 33% test code, you lose in defect elimination, restatement of intent, and design insight. You also sacrifice depth of test coverage, because of the periphery problem.

This does not mean that test generators aren’t useful. As I said earlier, I think they can help to partially characterize a large base of legacy code. But these tools are not TDD tools. The tests they generate are not equivalent to tests written using TDD. And many of the benefits of TDD are not achieved through test generation.

Comments

Leave a response

  1. Avatar
    swombat 29 minutes later:

    Very true. As a dedicated RSpec BDDer, it is a bit shocking to hear of people trying to take the shortcut of generating their tests. If I was in a smart-arse mood, I might comment that they should take an even quicker shortcut and just generate the test reports directly! Why bother running the tests at all? They’ve already cut out 90% or so of the benefits of testing, why not go all the way? :-)

    One element which is missing from your article is the use of TDD as a design process. This is especially the case in BDD, but as BDD is supposed to simply be “TDD done right”, with a better adapted vocabulary, what’s true of BDD tends to hold for TDD as well. When you write tests first, it makes you think about the design of the item you’re writing in a way that’s immensely helpful.

    Another important use of TDD is to ensure that you let user stories drive the requirements. In this case, you’d write a user story (e.g. using FIT or the RSpec Story Runner) first, then write a view spec, then a controller spec, then finally a model spec if required. Thus, every line of new code that you write is driven by a clear user benefit, and you waste no time implementing features that You Ain’t Gonna Need (It). In my experience, this goes a long way towards reducing cruft and keeping your codebase tight and focused on user benefits.

    Daniel

  2. Avatar
    Johan Samyn about 1 hour later:

    A comment from a somewhat unusual angle :
    Two things I like most about this post : the comparison with the accounting practice (great for advocacy), and the referring to the human factor. The second the most. Indeed, we humans are the most important factor, and will stay so for quite some time I believe. This helps me understand why you run a successful company : you seem to value people, the single most important asset there is. And that’s a great thing. You consider humans as an important factor in the process of writing software. Not just the languages, tools, methodologies and so on. But those people using all that. They are the binding glue, the commanding factor in the process. The importance of us, as valuable human beings, can’t be stressed enough. That’s why other/new tools and so can’t beat the fact that you can get more out of a team by educating them, because that is helping to make those good people (the best factor in the game) even better. Which is not always understood.

  3. Avatar
    Pavel Tcholakov about 18 hours later:

    Great post, and the double entry bookkeeping analogy is excellent!

  4. Avatar
    Jeff Langr about 19 hours later:

    I’m wondering if FitNesse would be 75,000 total lines, no tests, were it not written test-first.

  5. Avatar
    Steve Meuse about 22 hours later:

    Very nice article, insightful and clear. One tiny glitch: Uncle Bob wrote, “The burden is not insignificant. FitNesse, an application created using TDD, is comprised of 45,000 lines of Java code, 15,000 of which are unit tests. Simple math suggests that TDD increases the coding burden by a full third!”

    The portion of TDD code may be 1/3, but the coding burden is increased by 1/2.

    Given: FitNesseWithTDD = 45,000 LOC TDDPortion = 15,000 LOC We know: FitNesseWithoutTDD = 30,000 LOC RatioOfExtraTDDCode = TDD / FitNesseWithoutTDD = 15,000 / 30,000 = 1/2

    Of course, this doesn’t account for the time saved finding bugs before and during the coding phase, rather than retrospectively. A hypothetical FitNesse developed with traditional testing methods could well be bigger than 30,000 LOC. Still, the delta is a bit higher than a third, just so the folks who write the checks and set the schedules know what to expect and when.

    I agree with Pavel. The bookkeeping analogy is brilliant. Thanks!

  6. Avatar
    Steve Meuse about 22 hours later:

    Arghh. The “Given”/”We Know” block lost its formatting. It should be eight lines, which Preview displays correctly:

    Given:
         FitNesseWithTDD = 45,000 LOC
         TDDPortion = 15,000 LOC
    We know:
         FitNesseWithoutTDD = 30,000 LOC
         RatioOfExtraTDDCode = TDD / FitNesseWithoutTDD
         = 15,000 / 30,000
         = 1/2

  7. Avatar
    unclebob about 23 hours later:

    The portion of TDD code may be 1/3, but the coding burden is increased by 1/2.

    Damn! How do you TDD a blog?

  8. Avatar
    DAR about 24 hours later:

    Actually, the tests increased the size of the code base by 50%.

    “45,000 lines of Java code, 15,000 of which are unit tests”

    So that means 30K LOC without tests. 15K/30K = .5. So +15K means +50%.

    I don’t have a problem with that (I’m a strong TDD advocate myself). But it doesn’t serve anybody well to have the numbers wrong.

  9. Avatar
    Amund 3 days later:

    quote: “The burden is not insignificant. FitNesse, an application created using TDD, is comprised of 45,000 lines of Java code, 15,000 of which are unit tests. Simple math suggests that TDD increases the coding burden by a full third!”

    That is not simple math, it is likely to be advanced-alternative-history-math. How do you know that it would end up with 30k lines (and still work) if it was written without TDD?

    TDD also drives design and my guess is that you are likely to end with quite a different application when using other development approaches.

  10. Avatar
    Eric Landes 3 days later:

    Bob, a great point here. I’ve posted some more thoughts at the URL I’ve posted, that relate to TDD and using that with Visul Studio Testing (for those new to TDD).

  11. Avatar
    choy 3 days later:

    the double-entry bookkeeping analogy is insightful. i think it’d be interesting to take this analogy too far.

  12. Avatar
    Ross MacGregor 19 days later:

    Bob, I’m not sure the Periphery Problem as you’ve stated it is really a good argument.

    “The test generators could not randomly generate valid games. It’s not hard to see why. A valid game is a sequence of between 12 and 21 rolls, all of which must be integers between 0 and 10. What’s more, within a given frame, the sum of rolls cannot exceed 10. These constraints are just too tight for a random generator to achieve within the current age of the universe.”

    Here you assume these constraints cannot be captured by the system, but why couldn’t they? Perhaps we need languages with more powerful constraint systems like Eiffel that has championed Design by Contract programming.

    For example the problem of the number of pins is easily solved by creating an integer value type that only has the range of 0-10. So when it comes time to generate a random value it can only generate valid numbers.

    DBC like TDD is a methodology for designing software that promises to increase the quality of the design. Perhaps with enough practice with DBC programming one would be able to capture most of these elusive constraints you speak of. You want to capture these constraints programmatically so that you can verify that your software is not operating in an invalid state.

    If DBC programming languages were more common perhaps these tools may actually work well and be able to auto generate most of the unit tests.

  13. Avatar
    unclebob 26 days later:

    For example the problem of the number of pins is easily solved by creating an integer value type that only has the range of 0-10. So when it comes time to generate a random value it can only generate valid numbers.

    The problem is that a valid game is not just a sequence of rolls between 0 and 10. Within any frame (usually two rolls) the sum of the rolls cannot exceed 10; but if the first roll is 10, then that rule doesn’t count.

    Trying to capture all the constraints for valid rolls is tantamount to writing the scoring algorithm.

  14. Avatar
    Yet another Bob about 1 month later:

    Bob,

    What about code generation? Some people suggest that for certain solutions where metaprogramming is being applied you should generate tests and the corresponding code you want to test – but then you are not really expressing the intent twice (weeellll, the generating code consists of two parts, with one part generating the tests and another one generating the code).

    Of course you can TDD the generation code and let it generate simple examples that you test automatically as well.

    What’s your take on this?

  15. Avatar
    Yazid about 1 month later:

    Hello,

    I love TDD (or Test first), recently Microsoft research team created something called a tool called PEX.

    This a description of Pex

    Pex (Program EXploration) is an intelligent assistant to the programmer. By automatically generating unit tests, it helps to find bugs early. In addition, it suggests to the programmer how to fix the bugs and here is a link

    http://research.microsoft.com/Pex/

    Can this tool be useful or does it defeat the goal of TDD?

    Thx Dr Y Arezki

  16. Avatar
    mmorpg about 1 month later:

    funny how something with 33% more code can be that much more efficient…some of these algorithms people are developing have me in severe coder envy.

  17. Avatar
    88250 2 months later:

    en, I am a Chinese newbie in TDD. I wanna translate this entry into chinese, may i?

  18. Avatar
    <a href="http://subway--coupons.blogspot.com">subway coupons</a> 4 months later:

    i also interested too

  19. Avatar
    Curtis Cooley 6 months later:

    The double entry accounting analogy is pure brilliance!

  20. Avatar
    Charles 11 months later:

    OK, I am willing to accept the premise. Having a third more code may be more efficient overall but has anyone done any true test to confirm this. I am all for increasing efficiency but too often we programmers seem to be in an “add more code” mindset.

  21. Avatar
    Peli 12 months later:

    Hi Bob,

    The type of test generators that you describe are indeed detrimental to TDD; because they are missing the oracle that tells whether the code is behaving as it should (the oracle sit in your head).

    However, if you can provide an oracle (i.e. assertion, invariants, etc…), test generators will help the TDD process because they will cover a lot of corner cases that otherwize get forgotten by the developer.

    With Pex, an automated whitebox test generator, we use parmaterized unit tests (which have been around for a while) to start the code exploration. If this test contains assertions, the tool will try to fail them. If it does not, then it is a bad test – parameterized or not -.

    - using two of the better known test generation tools. Can you share the names?

  22. Avatar
    Peli 12 months later:

    Hi Bob,

    The type of test generators that you describe are indeed detrimental to TDD; because they are missing the oracle that tells whether the code is behaving as it should (the oracle sit in your head).

    However, if you can provide an oracle (i.e. assertion, invariants, etc…), test generators will help the TDD process because they will cover a lot of corner cases that otherwise get forgotten by the developer.

    With Pex, an automated whitebox test generator, we explore the code from user-written paramterized unit tests (i.e. unit tests with parameters). If this test contains assertions, the tool will try to fail them. If it does not, then it is a bad test – parameterized or not -. Whenever you write a unit test and hard-code a value that does not matter (i.e. you hardcode “Marc” as a field name), then you should refactor out that value and let a tool ‘explore’ it. Note that parameterized unit tests can be written in a Test First fashion. More in this article: http://dspace.mit.edu/bitstream/handle/1721.1/40090/MIT-CSAIL-TR-2008-002.pdf

    > using two of the better known test generation tools. Can you share the names?

  23. Avatar
    Peli 12 months later:

    Hi Bob,

    The type of test generators that you describe are indeed detrimental to TDD; because they are missing the oracle that tells whether the code is behaving as it should (the oracle sit in your head).

    However, if you can provide an oracle (i.e. assertion, invariants, etc…), test generators will help the TDD process because they will cover a lot of corner cases that otherwise get forgotten by the developer.

    With Pex, an automated whitebox test generator, we explore the code from user-written paramterized unit tests (i.e. unit tests with parameters). If this test contains assertions, the tool will try to fail them. If it does not, then it is a bad test – parameterized or not -. Whenever you write a unit test and hard-code a value that does not matter (i.e. you hardcode “Marc” as a field name), then you should refactor out that value and let a tool ‘explore’ it. Note that parameterized unit tests can be written in a Test First fashion. More in this article: http://dspace.mit.edu/bitstream/handle/1721.1/40090/MIT-CSAIL-TR-2008-002.pdf

    > using two of the better known test generation tools. Can you share the names?

  24. Avatar
    MTS File Converter over 2 years later:

    ah ha so wahy

  25. Avatar
    Technology Information over 2 years later:

    Good informative site.

  26. Avatar
    bag manufacturer over 3 years later:

    a long time to write all those unit tests and are looking to test-generation tool

  27. Avatar
    Designer Bags over 3 years later:

    Good topic.Thanks for sharing! It really helpful to me about those information.

  28. Avatar
    chanel store over 3 years later:

    Very pleased that the article is what I want! Thank you

  29. Avatar
    virtuemart templates over 3 years later:

    The type of test generators that you describe are indeed detrimental to TDD; because they are missing the oracle that tells whether the code is behaving as it virtuemart templates

  30. Avatar
    iPhone contacts Backup over 3 years later:

    I don’t know much about TDD. So, I want to know. When I come to here, I think I am in the right place. the web gives me a lot of infomation, it is very informative. I think lots of people can learn much here. I will come to here again. Thanks.

  31. Avatar
    pandora uk over 3 years later:

    and tempts people to find ways to shrink it without losing any of the benefits.

  32. Avatar
    http://www.blacktowhiteiphone4.com over 3 years later:

    Enjoy the christmas time and enjoy the hottest white iphone 4. All you need to do is the white iphone 4 conversion kit home.

  33. Avatar
    iPad to mac transfer over 3 years later:

    Thanks for shareing! I agree with you. The artical improve me so much! I will come here frequently. iPad to Mac Transfer lets you transfer music, movie, photo, ePub, PDF, Audiobook, Podcast and TV Show from iPad to Mac or iPad to iTunes.

  34. Avatar
    iPad PDF Transfer for Mac over 3 years later:

    I really like this essay. Thank you for writing it so seriously. I want to recommend it for my friends strongly. iPad PDF Transfer for Mac can help you transfer ebooks in PDF format from ipad to mac/iTunes.

  35. Avatar
    axial fan over 3 years later:

    This article is very usefull for me! I can see that you are putting a lots of efforts into your blog. I will keep watching in your blog, thanks.

  36. Avatar
    Criminal Check over 3 years later:

    The tests they generate are not equivalent to tests written using TDD.

  37. Avatar
    Low Carb Meals over 3 years later:

    It is proven time and time again that information’s worth is not the main factor which impacts article

    promotion results.

  38. Avatar
    Criminal Records over 3 years later:

    Some of the tools will even write the observations as JUnit tests, so that you can run them as a standard test suite.

  39. Avatar
    Tenant Screening over 3 years later:

    Another important use of TDD is to ensure that you let user stories drive the requirements. In this case, you’d write a user story (e.g. using FIT or the RSpec Story Runner) first, then write a view spec, then a controller spec, then finally a model spec if required.

  40. Avatar
    cable ties over 3 years later:

    great effort exerted.

  41. Avatar
    Sunglass over 3 years later:

    Buy $10 Replica Designer Sunglasses with 3-day FREE SHIPPING

  42. Avatar
    SEO Firm India over 3 years later:

    Excellently written article, if only all blogger offered the same content as you, the internet would be a much better place. Please keep it up!

  43. Avatar
    dory over 3 years later:

    Useful information will I follow your posts. Social Network

  44. Avatar
    okey oyunu oyna over 3 years later:

    thank you very much

    internette görüntülü olarak okey oyunu oyna, gerçek kisilerle tanis, turnuva heyecanini yasa.

  45. Avatar
    ford leveling kit over 3 years later:

    I always read your blogs and I like it. thank you :-)

  46. Avatar
    leveling kit ford over 3 years later:

    Thanks you, its very formative :-)

  47. Avatar
    leveling kit f250 over 3 years later:

    Cool website very informative :-)

  48. Avatar
    f350 leveling kit over 3 years later:

    Thank you for sharing information :-)

  49. Avatar
    Discont Louis Vuitton Scarves over 3 years later:

    Thank you for sharing information :-

  50. Avatar
    Jewellery over 3 years later:

    i am happy for your responses

  51. Avatar
    beats by dr dre headphones over 3 years later:

    Beats by dr dre studio with look after talk in white. extra attributes on Monster Beats By Dr. Dre Pro Headphones Black a specific tri-fold design and design and carrying circumstance which make for compact and uncomplicated safe-keeping when not in use. Beats by dr dre solo .

  52. Avatar
    Diablo3 over 4 years later:

    hmm ,i’m not sure if this is what i’m looking for but anyway this is interresting and could be useful some day,thanks for taking time to write such cool stuff

  53. Avatar
    african Mango dr oz over 4 years later:

    Cloudsourcing combines on-demand business process outsourcing (BPO) with crowdsourcing technologies to enable companies to purchase quality BPO services on-demand through a pay-per-use model.

  54. Avatar
    Ashley Bowling over 4 years later:

    A fool and his money are soon parted A friend in need is a friend indeed A golden key can open any door

  55. Avatar
    christian louboutin over 4 years later:

    The professional design make you foot more comfortable. Even more tantalizing,this pattern make your legs look as long as you can,it will make you looked more attractive.Moveover,it has reasonable price.If you are a popular woman,do not miss it.

    Technical details of Christian Louboutin Velours Scrunch Suede Boots Coffee:

    Color: Coffee
    Material: Suede
    4(100mm) heel
    Signature red sole x

    Fashion, delicate, luxurious Christian louboutins shoes on sale, one of its series is Christian Louboutin Tall Boots, is urbanism collocation. This Christian louboutins shoes design makes people new and refreshing. Red soles shoes is personality, your charm will be wonderful performance.

  56. Avatar
    grad school personal statement over 4 years later:

    I am absolutely amazed at how terrific the stuff is on this site. I have saved this webpage and I truly intend on visiting the site in the upcoming days. Keep up the excellent work!

  57. Avatar
    iPad to Mac over 4 years later:

    It is true that if we want to make a great improve in programing. we need do understant the exactly mean of each conception. And we need do practice. so, why not think about doing it now.

  58. Avatar
    lipozene over 4 years later:

    thank you for the post. as always great stuff

  59. Avatar
    youngbrown over 4 years later:

    Thanks for the information, I’ll visit the site again to get update information Toys

  60. Avatar
    resume service reviews over 4 years later:

    It’s my first time to visit here. I discovered a lot of interesting things within your blog especially its discussion.

  61. Avatar
    sexy car tattoos over 4 years later:

    terrific the stuff is on this site. I have saved this webpage and I truly intend on visiting the site in the upcoming days. Keep up the excellent work!

  62. Avatar
    amazing audi cars over 4 years later:

    I’ll visit the site again to get update information

  63. Avatar
    sexy car tattoos over 4 years later:

    I have saved this webpage and I truly intend on visiting the site amazing audi cars

  64. Avatar
    louboutin sales over 4 years later:

    Generated Tests and TDD 63 hoo,good article!!I like the post!60

  65. Avatar
    AndroidSU over 4 years later:

    age and I truly intend on visiting the site

  66. Avatar
    Christian Louboutin d orsay pumps over 4 years later:

    When you have the task of looking for the Christian Louboutin d orsay pumps best motorcycle boot, it can be very overwhelming platform Christian Louboutin at times. There is no such thing as the Christian Louboutin Ron Ron best motorcycle boot because the type of Christian Louboutin for sale boot that is suited to a person depends on his needs and preferences.

  67. Avatar
    http://www.thewebloan.com over 4 years later:

    Great post, and the double entry bookkeeping analogy is excellent!

    And we need do practice. so, why not think about doing it now.

  68. Avatar
    yellow hermes birkin over 4 years later:

    Hermes is the byword for elaborate fashion?

  69. Avatar
    algebra homework help over 5 years later:

    There is so much to get from it.anova test peace algebra homework help Keep up the good work.Thanks for sharing the information.

  70. Avatar
    algebra homework help over 5 years later:

    There is so much to get from it.anova test peace algebra homework help Keep up the good work.Thanks for sharing the information.

Comments