Dependency Management: HtmlUnit 110
If you are planning on building an API, please, please, think about dependency management. Don’t make me know more about your world view than necessary. Consider what happened to me as I explored HtmlUnit…
I’m using HtmlUnit to parse and interpret HTML web pages. I’ve been very impressed with this library so far. And I appreciate the hard work and dedication of people who give their software away for free. So, although this blog is a complaint, it should not be misconstrued into anything more than constructive criticism. Besides, what I am complaining about here is so universal that it really wouldn’t matter whose software I chose to scrutinize. The HtmlUnit authors just got lucky in this case.
What I want to do with HtmlUnit is quite simple. Given a string containing HTML, I’d like to query that HTML for certain tags and attributes. For example, I’d like to do this:
HtmlPage page = HTMLParser.parse(htmlString);
HtmlElement html = page.getDocumentElement();
HtmlElement listForm = html.getHtmlElementById("list_form");
assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));
Sweet, simple, uncomplicated. Just create the DOM from an HTML String, and then query that DOM. Unfortunately, HtmlUnit does not appear to be that simple. What you have to do instead looks like this:
StringWebResponse stringWebResponse = new StringWebResponse(htmlString);
WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);
HtmlPage page = HTMLParser.parse(stringWebResponse, new TopLevelWindow("", webClient));
HtmlElement html = page.getDocumentElement();
HtmlElement listForm = html.getHtmlElementById("list_form");
assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));
The extra stuff in here is apparently due to the fact that the authors wanted to be able to simulate browsers, frames, and javascript. I think their goal was laudable. However, I wish they had done this without forcing those frames, browsers, and script engines down my throat.
Given my simple needs, why do I care about WebClient and Window. Why do I have to turn off the javascript engine? It may seem a small thing, but it bothers me nonetheless. It’s the principle of the matter that gets under my skin. The pragmatic programmers called it The Principle of Least Surprise. I call it, simply, dependency management. Don’t make people depend on more than they need.
The cost, to me, was an hour of rooting around in the documentation, example code, and my own trial-and-error experiments. (The benefit to me was another blog topic ;-) That cost may not seem great; but it must be paid again and again by everyone who wants to use the package in a way that doesn’t quite fit the authors’ world view.
There may, in fact, be a simpler way to do what I want to do with HtmlUnit. If there is, I haven’t been able to find it, and I’d be grateful if anyone out there, including the authors, could guide me in the right direction.
Jasper: Problem resolved? 85
After digging around in the Jasper source code, and fiddling hither and yon with various build.xml configurations, I finally (and quite by accident) hit on the solution to my trouble…
Grumble!
I don’t know why this works, but it does. If any of you out there are having trouble precompiling your jsps that use custom tags this might help.
<target name="jsp" depends="dist">
<delete dir="${basedir}/testjsp"/>
<java classname="org.apache.jasper.JspC" fork="true">
<arg line="-v -d ${basedir}/testjsp -p com.objectmentor.library.jsp -mapped -compile -webapp ${build.war.home}"/>
<arg line="WEB-INF/pages/books/manage.jsp"/>
<classpath>
<fileset dir="${catalina.home}/common/lib">
<include name="*.jar"/>
</fileset>
<fileset dir="${catalina.home}/server/lib">
<include name="*.jar"/>
</fileset>
<fileset dir="${catalina.home}/bin">
<include name="*.jar"/>
</fileset>
<fileset dir="${build.war.home}/WEB-INF/lib">
<include name="*.jar"/>
</fileset>
<pathelement location="/Developer/Java/Ant/lib/ant.jar"/>
</classpath>
</java>
<jar jarfile="${build.jar.home}/jsp.jar" basedir="${basedir}/testjsp"
includes="**/jsp/**/*.class"
/>
</target>
Notice the second <arg>
tag. If you put the file name of the jsp you want to compile on the command line, it compiles the jsp correctly. If you leave it off, then even though all the documentation says that it will scan for all the jsps in the web app and compile them correctly, it will do the former, but not the latter. It will find all the jsps, but it wont compile them correctly. It will fail to statically initialize the _jspx_dependants
variable in the generated code.
I am not at all sure why the compiler behaves this way. I looked at the Jasper code, but I didn’t feel like working my way through it to debug it. There is some funny business in the JspC.locateUriRoot
function where it writes the file path of the file argument on top of the uribase
command line argument. That might be the problem. But I’m not at all sure.
Anyway, there’s a new unit test for someone to write. (sigh).
Now I can write my unit test!
BTW I am using Tomcat 5.5.20
Jasper: The Saga Continues. 8
It’s Sunday. Justin came in second in his wrestling tournament. And I downloaded the source code for Tomcat to figure out how the Jasper compiler works. sigh
I have an hour or so before Ann Marie and I drive to Angela’s house to visit our three granddaughters, then pick up some things at Sam’s club, and then I fly to Pennsylvania to kick off an Agile transition for a new customer.
Last night, after the tournament, I started studying how Jasper
works. At the high level, Jasper is a fairly standard compiler that parses it’s input and creates an intermediate representation of nodes. It generates java output by walking those nodes with code-generating visitors.
At the low level Jasper was written by slinging a lot of brute force code around. There are long if/else
chains, and funky variables like t1
, t2
, t3
, and t4
. For some reason the authors felt they should do their own parsing rather than using antlr
or javacc
; but that’s fine with me. JSP is a pretty simple syntax after all.
What I’ve learned so far is that there are quite a few ways for the compiler to ignore the taglib
directive. I don’t understand yet what all those reasons are, I can just see the if
statements that silently bypass that processing. Some are based on a variable named isDirective
. This variable is passed down a long chain of calls from the upper levels of the compiler, down to the lower reaches of the generators. I still haven’t figured out why it gets set, but I think it might have something to do with a special compiler pass or something.
Anyway, I won’t bore you with the details right now. I’ll just see if I can learn anything more about how this thing works.
But I will say that it’s a shame that I have to dig through compiler code to figure out why Jasper generates code differently in the tomcat environment and the command-line environment.Two hours later… While digging through the compiler code, I noticed that there are lots of nice debug logging messages. If I could turn on debug logging, I could get a sense about what’s happening inside the compiler.
So now all I have to do is turn on debug logging. That ought to be easy, right?...
Ant, JspC, and Other Horrors. 46
I’ve been trying to precompile JSPs today. They reference custom tag libraries. What a joy…
You’d think it would be simple to translate JSP file into Java. Tomcat does it all the time, automatically.
I’d very much like to precompile my JSP files, because I want to write tests for them; and I don’t want to use Cactus, or have the server up for my tests.
I found the Jasper compiler, and the ant task that invokes it, but I thought it might be nice to run it from the command line just to see how it work. You know:java org.apache.jasper.JspC myFile.jsp
You’d think that would be easy, wouldn’t you?
But NO. You have to include a zillion jars just to get the JspC compiler to run. Jars from apache.common, and appache.server, and commons.logging, etc. You even have to include the ant.jar
file.
I consider that last to be completely sick. What the H___ does JspC have to do with Ant? Why in the begrosian devalent is there a dependency on ant, of all things??!!? Haven’t these people heard of Dependency Management?
sigh
Anyway, I gave up on the command line idea because apparently the Jasper compiler wants to see the web.xml file and so the whole web app has to be put together before you can run the compiler. I understand why they did this, but it sure is frustrating for someone who just wants to run the frickin compiler.
So I fell back on the ant task. Now I have to tell you that I hate ant. Ant was born during those sick sick days when people thought that XML was a cool grammar. XML is not a cool grammar. XML is a markup language. It works fine to encode data that machines can read, and humans can barely read, but it is by no means a natural syntax. Ant suffers from terrible keyword-itis, and the nasty inflexibility.
Inflexible you say? Why, can’t you write any ant-task you please?
Sure, so what. You think I want to read a bunch of java code to figure out how to invoke the JspC command line? Hell, all I want to do is compile one frickin JSP file into a JAVA file. If ant let me pass in command lines, like make or rake or some other reasonable build tool, I just might be able to do that.
But NO. The JspC ant task wants to compile the WHOLE web app for me. And it wants there to be a web.xml file that describes that web app.
sigh
So, like a good little ant slave, I put the ant build script for JspC into my build.xml file and fired it up. It took a little fiddling and cajoling. But eventually I got the JspC compiler to run. And what did it say?
/Users/unclebob/projects/Library/Library/build.xml:147: org.apache.jasper.JasperException: file:/Users/unclebob/projects/Library/Library/web/WEB-INF/pages/template.jsp(25,21) Unable to load tag handler class "com.objectmentor.library.web.framework.tags.ActionPathTag" for tag "library:actionPath"
(what is it about error messages nowadays that they have to be 253 characters in length? Why can’t we have nice little error messages?)
The problem is that the JSP file that I am trying to compile is using a custom tag, and apparently there is no way to get the JspC ant task to tell the JspC compiler the classpath of my tag handler. I’ve tried everything I could think of, and searched high and low on the net, but I can’t seem to figure out how to make this work.
So I’m done for the weekend. I was hoping to write a blog on testing JSPs this weekend while attending my son’s wrestling match (he’s likely to make it to state this year), but it’ll have to wait because there’s no internet at the match.
I guess I’ll play with Ruby instead.
UPDATE
While eating dinner, it hit me. If I set the CLASSPATH variable to include the jar with the tag in it, and if I invoke JspC from the command line (or with a java ant task) it might work. So I tried it, and…whaddyano? it worked just fine.
Here is the ant target I eventually used. Note the hideous dependencies required by JspC. You might find this target useful if you ever want to precompile jsps that use custom tags.
<target name="jsp" depends="jar">
<delete dir="${basedir}/test/testjsp"/>
<java classname="org.apache.jasper.JspC" fork="true">
<arg line="-d ${basedir}/test/testjsp -p com.objectmentor.library.testjsp -webapp ${web.home}"/>
<classpath>
<fileset dir="${catalina.home}/common/lib">
<include name="*.jar"/>
</fileset>
<fileset dir="${catalina.home}/server/lib">
<include name="*.jar"/>
</fileset>
<fileset dir="${catalina.home}/bin">
<include name="*.jar"/>
</fileset>
<pathelement location="/Developer/Java/Ant/lib/ant.jar"/>
<pathelement location="${build.jar.home}/library-framework.jar"/>
</classpath>
</java>
</target>
Now on to the wrestling match!
Not so fast.
Some of the java files generated from the JSPs don’t compile. There is some mismatch between the tag processing libraries or something. The generated code is hideous to read, and the problem is probably subtle. So I’m going to go to bed.
urg!.
WRESTLING and SUCCESS
So, here I am at the wrestling match. The thing about these matches is that I need to pay attention for about 6 minutes every two hours. That’s how often Justin wrestles. So, of course, I bring my laptop and sit on the bleachers working on code while the whistles are blowing and the parents are screeching and my butt slowly gets numb.
Justin won his first match. It was a slaughter. He’s strong and smart, and has a good chance to get into the state tournament.
I won my first match with JSP too. Apparently the compile environment of my IDE is not exactly the same as the compile environment that Jasper uses. But I was able to resolve that by simply setting the -compile switch on the JspC command line. This compiles the generate java files in place using what seems to be the correct compile environment. At least there are no compiler errors.
I also set the -mapped command line argument. This causes jasper to create a new Out(...) call for each line on the input jsp. This makes the java file a bit easier to read. Apparently this command line is set when tomcat automatically compiles the jsps, and so comparing my generated files and tomcats generated files is easier with this argument set.
I set up a simple unit test to see if I could create an instance of one of the generated servlets. It looks like this:public class ManageJspTest extends TestCase {
public void testCreate() throws Exception {
HttpJspBase manage = new com.objectmentor.library.jsp.WEB_002dINF.pages.books.manage_jsp();
}
}
At first this didn’t compile because the servlet apparently uses the apache.commons.logging. stuff. So I had to put that in the classpath of the unit tests. Grumble! “Dependency management guys! Dependency Management. Why do my unit tests need to know about logging? sigh
Anyway: I can now compile a unit test that creates a servlet generated by Jasper!!!
This is very good news. I have my doubts about whether it will work however. The code generated by my ant script, and the code generated by tomcat are not exact matches. There is one segment that is markedly different. The following snippet is in the tomcat generated java file, but not in the ant-script generated java file. static {
_jspx_dependants = new java.util.ArrayList(1);
_jspx_dependants.add("/WEB-INF/tld/LibraryTags.tld");
}
I don’t understand exactly why, but I think it must have something to do with a difference in the way the two environments view the tag libraries.
Anyway, I imagine that when I try to execute the servlets generated by the ant script, they will fail because this code is missing.
Anyway, I think I’ll go watch Justin wrestle some more.
Justin won his second match at Regionals. That put’s him in the finals for the tournament, and ensures that he has a spot in the Sectionals next week.
Justin doesn’t wrestle again until 4pm, so Ann Marie and I went home for a couple of hours. While there, I was able to set up a quick test to invoke the generated servlet. It looks like this:
public class ManageJspTest extends TestCase {
private MockPageContext pageContext;
private MockJspWriter jspWriter;
private JspFactory mockFactory;
public void testCreate() throws Exception {
jspWriter = new MockJspWriter(10000, false);
pageContext = new MockPageContext(jspWriter);
mockFactory = new JspFactory() {
public PageContext getPageContext(Servlet servlet, ServletRequest servletRequest, ServletResponse servletResponse, String string, boolean b, int i, boolean b1) {
return pageContext;
}
public void releasePageContext(PageContext pageContext) {
}
public JspEngineInfo getEngineInfo() {
return null;
}
};
JspFactory.setDefaultFactory(mockFactory);
HttpJspBase manage = new com.objectmentor.library.jsp.WEB_002dINF.pages.books.manage_jsp();
MockHttpServletRequest request = new MockHttpServletRequest();
MockHttpServletResponse response = new MockHttpServletResponse();
manage._jspService(request, response);
}
}
Note all the mocks! Here is where the tomcat guys did a nice bit of work. The JspFactory
is used within the generated servlet to gain access to all the objects it uses. It turns out that you can override the default implementation of the JspFactory
by simply calling setDefaultFactory
. Nice! That’s Dependency Management!
MockJspWriter
! Unfortunately, it’s nowhere near enough. The output stops as soon as the first tag was invoked. Indeed, there is a wonderfully silent return
right in the generated code. No exception, no error message, just silent failure. sigh. Here’s the generated code:
out.write("<form action=\"");
if (_jspx_meth_library_actionPath_0(_jspx_page_context))
return;
It was generated from this portion of the jsp.
<form action="<library:actionPath actionName="books/manage"/>" method="post" id="list_form">
The actionPath
tag is one of our custom tags. All it does is much a controller path into an appropriate url format. (If you don’t understand that, don’t worry about it.) Clearly our custom tag is not being found. This is probably because the jasper compiler is not setting the _jspx_dependants
variable properly. So now I need to figure out why…
For some reason, Jasper is not paying attention to the following line in my jsp file:
<%@ taglib uri="/WEB-INF/tld/LibraryTags.tld" prefix="library" %>
It’s not clear why, though I have to say that pattern of silent errors is really starting to bother me. An error message would certainly be nice.
Why is Jasper ignoring the taglib
directive? The LibraryTags.tld
file is there, and in the right place. The ActionPathTag.class
file is in the classpath used to invoke jasper. I’ve played with the -uribase
argument of the jasper compiler (for fun, read the description of this argument and see if YOU understand what it’s saying.) It’s a mystery…
Specs vs. Tests 49
There’s something to this BDD kool-aid that people have been drinking lately…
As part of the Rails project I’ve been working on for the last few weeks, I’ve been using RSpec. RSpec is a unit testing tool similar in spirit to JUnit or Test/Unit. However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.
In Java, using JUnit, we might write the following unit test:public class BowlingGameTest extends TestCase {
private Game g;
protected void setUp() throws Exception {
g = new Game();
}
private void rollMany(int n, int pins) {
for (int i=0; i< n; i++) {
g.roll(pins);
}
}
public void testGutterGame() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
assertTrue(g.isComplete());
}
public void testAllOnes() throws Exception {
rollMany(20,1);
assertEquals(20, g.score());
assertTrue(g.isComplete());
}
}
This is pretty typical for a Java unit test. The setup function builds the Game
object, and then the various test functions make sure that it works in each different scenario. In Ruby however, this might be expressed using RSpec as:
require 'rubygems'
require_gem "rspec"
require 'game'
context "When a gutter game is rolled" do
setup do
@g = Game.new
20.times {@g.roll 0}
end
specify "score should be zero" do
@g.score.should == 0
end
specify "game should be complete" do
@g.complete?.should_be true
end
end
context "When all ones are rolled" do
setup do
@g = Game.new
20.times{@g.roll 1}
end
specify "score should be 20" do
@g.score.should == 20
end
specify "game should be complete" do
@g.complete?.should_be true
end
end
At first blush the difference seems small. Indeed, the RSpec code might seem too verbose and fine-grained. At least that was my first impression when I first saw RSpec. However, having used it now for several months I have a different reaction.
First, let’s looks a the semantic differences. In JUnit you have TestCase
derivatives, and test functions. Each TestCase
derivative has a setUp
and tearDown
function, and a suite of test
functions. In RSpec you have what appears to be an extra layer. You have the test script, which is composed of context
blocks. The contexts have setup
, teardown
, and specify
blocks.
At first you might think that the RSpec context
block coresponds to the Java TestCase
derivative since they are semantically equivalent. However Java throws something of a curve at us by only allowing one public class per file. So from an organizational point of view there is a stronger equivalence between the TestCase
derivative and the whole RSpec test script.
This might seem petty. After all, I can write Java code that is semantically equivalent to the RSpec code simply by creating two TestCase
derivatives in two different files. But separating those two test cases into two different files makes a big difference to me. It breaks apart things that otherwise want to stay together.
Now it’s true that I could keep the TestCase
derivatives in the same file by making them package
scope, and manually put them into a public TestSuite
class. But who wants to do that? After all, my IDE is nice enough to find and execute all the public TestCase
derivatives, which completely eliminates the need for me to build suites—at least at first.
Note: The JDave tool provides BDD syntax for Java. |
Again, this might seem petty; and if that were the only benefit to the RSpec syntax I would agree. But it’s not the only benefit.
Strange though it may seem, the next benefit is the strings that describe the context
and specify
blocks. At first I thought these strings were just noise, like the strings in the JUnit assert
functions. I seldom, if ever, use the JUnit assert
strings, so why would I use the context
and specify
strings? But over the last few weeks I have come to find that, unlike the JUnit assert
strings, the RSpec strings put a subtle force on me to create better test designs.
Stable State: An Emergent Rule.
When a spec fails, the message that gets printed is the concatenation of the context
string and the specify
string. For example: 'When a gutter game is rolled game should be complete' FAILED
. If you word the context and specify strings properly, these error message make nice sentences. Since, in TDD, we almost always start out with our tests failing, I see these error message a lot. So there is a pressure on me to word them well.
But by wording them well, I am constrained to obey a rule that JUnit never put pressure on me to obey. Indeed, I didn’t know it was a rule until I started using RSpec. I call this rule Stable State, it is:
Tests don’t change the state.
In other words, the functions that make assertions about the state of the system, do not also change the state of the system. The state of the system is set up once in the setUp
function, and then only interrogated by the test
functions.
If you look carefully at the specification of the Bowling Game you will see that the state of the Game
is changed only by the setup
block within the context
blocks. The specify
blocks simply interrogate and verify state. This is in stark contrast to the JUnit tests in which the test
methods both change and verify the state of the Game
.
If you don’t follow this rule it is hard to get the strings on the context
and specify
blocks to create error messages that read well. On the other hand, if you make sure that the specify
blocks don’t change the state, then you can find simple sentences that describe each context
and specify
block. And so the subtle pressure of the strings has a significant impact on the structure of the tests.
I can’t claim to have discovered the pressure of these strings. Indeed, Dan North’s original article on the topic is captivating. However, I felt the pressure and came to the same conclusion he did, well before I read his article; simply by using a tool inspired by his work.
The benefit of Stable State is that for each set of assertions there is one, and only one place where the state of the system is changed. Moreover the three level structure provides natural places for groups of state, states, and asserts.
The demise of the One Assert rule.
There have been other rules like this before. One that circulated a few years back was:
One assert per test.
I never bought into this rule, and I still don’t. It seems arbitrary and inefficient. Why should I put each assert
statement into it’s own test method when I can just as well put the assert
statement into a single test method.
public void testGutterGameScoreIsZero() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
}
public void testGutterGameIsComplete() throws Exception {
rollMany(20, 0);
assertTrue(g.isComplete());
}
over this:
public void testGutterGame() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
assertTrue(g.isComplete());
}
I think the authors of the One Assert rule were trying to achieve the benefits of Stable State, but missed the mark. It’s as though they could smell the rule out there, but couldn’t quite pinpoint it.
The State Machine metaphor
When you follow the Stable State rule your specifications (tests) become a description of a Finite State Machine. Each context
block describes how to drive the SUT to a given state, and then the specify
blocks describe the attributes of that state.
Dan North calls this the Given-When-Then metaphor. Consider the following triplet:
Given a Bowling Game: When 20 gutter balls are rolled, Then the score should be zero and the game should be complete.
This triplet corresponds nicely to a row in a state transition table. Consider, for example, the subway turnstile state machine:
Current State | Event | New State |
---|---|---|
Locked | coin | Unlocked |
Unlocked | pass | Locked |
Locked | pass | Alarm |
Unlocked | coin | Unlocked |
We can read this as follows:
GIVEN we are in the Locked state, WHEN we get a coin event, THEN we should be in the Unlocked state.
—GIVEN we are in the Unlocked state, WHEN we get a pass event, THEN we should be in the Locked state.—etc.
Describing a system as a finite state machine has certain benefits.
- We can enumerate the states and the events, and then make sure that every combination of state and event is handled properly.
- We can formalize the behavior of the system into a well known tabular format that can be read and interpreted by machines.
- I am, of course, thinking about FitNesse
- There are well known mechanisms for implementing finite state machines.
The point is that organizing the system description in terms of a finite state machine can have a profound impact on the system design and implementation.
The Butterfly Effect.
I find it remarkable that two dumb annoying little strings put a subtle pressure on me to adjust the style of my tests. That change in style eventually caused me to see the design and implementation of the system I was writing in a very new and interesting light.
Going Fast. 22
Over the last 35 years I have learned something about software.
The only way to go fast, is to go well.Brian Marick recently said it differently. He said:
When it comes to software it never pays to rush.
But this seems to fly in the face of common behavior. Most developer respond to schedule pressure by cutting corners, rushing, or falling back on quick and dirty solutions. The common belief is that these behaviors provide a short term increase in speed, even though they slow you down in the long run. Indeed, this behavior is so prevalent that there are many different anti-patterns named for the outcome. One is called Big Ball of Mud.
Every one of us has been slowed down by messy code. Often it is the same messy code that slows us down over and over again. Yet in spite of the fact that we are repeatedly slowed down by the messes we write, we still write them. Despite the impediment we clearly feel, we still believe that making the mess helped us go faster. This is about as close to to the classical definition of insanity as you can get!
I subscribe to a basic professional ethic: Making messes never makes you go faster, not even in the short term. I gained this ethic after years of proving it to myself the hard way. Quick and dirty solutions feel faster, but simply are not.
Actually, this is easy to prove with simple logic.
Assume that quick and dirty solutions help you go faster in the short term, but cause delays in the long term. We know that the latter is true, because we have all felt the impediment of messes.
Now, a long term is really just a series of short terms. The first short term can be finished more quickly by using a quick and dirty solution. But what about the second? Well, clearly, if our assumption is true then the second short term can also be finished more quickly by using a quick and dirty solution. And so can the third. And so can the fourth. And therefore the long term can be done more quickly as a series of quick and dirty solutions.
But this is counter to our observation that messes slow you down in the long term. Therefore the initial assumption is incorrect. Reductio ad absurdum. Q.E.D. Making messes slows you down even in the short term, no matter how short a term you choose!
But this is more than a simple play on logic. Teams actually behave this way! You've probably seen it yourself. A team makes a mess in order to get done "quicker", and tells themselves they will go back and clean it up later. But later never comes, and at the next schedule crunch the team repeats the behavior. They treat the long term as a series of short term quick and dirty solutions.
It comes down to a matter of professional ethics. Software developers are often disparaged for being sloppy, slow, and unreliable. Whole IT departments and product develoment teams carry the weight of appearing incompetent and moribund. Why? Because they succumbed to the seductive idea that quick and dirty solutions are faster in the short term.
True professionals don't work that way. True professionals keep their code clean at all times. True professionals write unit tests that cover close to 100% of their code. True professionals know their code works, and know it can be maintained. True professionals are quick and clean.
Private vs Protected 35
Someone on comp.object recently asked why anyone would make a field private since privacy ruins extensibility.
I recently read an article on comp.object that asked the following question:
While I can see that the ‘private’ modifier has its uses, I’m puzzled as to why it’s advocated so much given that one of the strong points of OO is extensibility.
I responded with:
The Open-Closed Principle of OOD (See article) says that objects should be open for extension but closed for modification. In other words, you should be able to change what a module does without changing the module. Extensibility, in OO, is best achieved when you keep the code you are extending safe from modification.How do you protect a module from the forces that would try to modify it? One technique is to keep the variables that module depends upon private. If a variable is not private, then it is open to be used in a way that the module that owns that variable does not intend. Indeed, using a variable in an unintended way is the only reason to make the variable public or protected. But when you use a variable in an unintended way you likely force modifications into the owner. If, on the other hand, all the variables of a module are private, then no modification can be caused through unintended useage.
Privacy does not preclude extensibility. You can create public or protected accessor methods that: 1) provide extenders access to certain variables, and 2) ensure that the extenders don’t use the variable in an unintended way.
For example, given a variable
v
used by a modulem
, such thatv
should never be negative. If you makev
public
orprotected
someone could set it to a negative number breaking the code inm
and possibly forcing mofidication tom
. However, ifv
is private but is accessible throughgetV
andsetV
methods; and if thesetV
method throws an exception if you pass it a negative number, thenm
is safe, and extenders are forced to follow the rules thatm
expects.To be fair, while I am a big proponent of keeping variables private, I have also come to rely much more on my unit tests to enforce the appropriate use of variables. When the code enjoys 90+% unit test coverage those tests will uncover and prevent variable misuse. This softens the need for the compiler to enforce privacy. This is not to say that you should not make your variables private, you should. It is to say that if you use TDD, the cost/benefit ratio changes, and you may find that you can soften access to some variables.
Testing GUIs Part I: RoR. 141
Testing GUIs is one of the the holy grails of Test Driven Develoment (TDD). Many teams who have adopted TDD for other parts of their projects have, for one reason or another, been unable to adequately test the GUI portion of their code.
In this series of article I will show that GUI testing is a solved problem. Over the years the TDD community has produced and accumulated tools, frameworks, libraries, and techniques that allow any team to test their GUI code as fully as any other part of their code.
Testing GUIs Part I: Ruby on Rails
In the world of web development, no community has solved the problem of GUI testing better than the Ruby on Rails community. When you work on a rails project, testing the GUI is simply de-rigeur. The rails framework provides all the necessary tools and access points for testing all aspects of the application, including the generation of HTML and the structure of the resulting web pages.
Web pages in rails are specified by .rhtml
files that contain a mixture of HTML and ruby code similar to the way Java and HTML are mixed in .jsp
files. The difference is that .rhtml
files are translated at runtime rather than being compiled into servlets the way .jsp
pages are. This makes it very easy for the rails environment to generate the HTML for a web page outside of the web container. Indeed, the web server does not need to be running.
This ease and portability of generating HTML means that the rails test framework merely needs to set up the variables needed by the ruby scriptlets within the .rhtml
files, generate the HTML, and then parse that HTML into a form that the tests can query.
A typical example.
The tests query the HTML using an xpath-like syntax coupled with a suite of very powerful assertion functions. The best way to understand this is to see it. So here is a simple file named:autocomplete_teacher.rhtml
.
<ul class="autocomplete_list">
<% @autocompleted_teachers.each do |t| %>
<li class="autocomplete_item"><%= "#{create_name_adornment(t)} #{t.last_name}, #{t.first_name}"%></li>
<% end %>
</ul>
You don’t have to be a ruby programmer to understand this. All it is doing is building an HTML list. The Ruby scriptlet between <% and %>
tokens simple loops for each teacher creating an <li>
tag from an “adornment”, and the first and last name. (The adornment happens to be the database id of the teacher in parentheses.) A simple test for this .rhtml
file is:
def test_autocomplete_teacher_finds_one_in_first_name
post :autocomplete_teacher, :request=>{:teacher=>"B"}
assert_template "autocomplete_teacher"
assert_response :success
assert_select "ul.autocomplete_list" do
assert_select "li.autocomplete_item", :count => 1
assert_select "li", "(1) Martin, Bob"
end
end
- The
post
statement simply invokes the controller that would normally be invoked by a POST url of the form:POST /teachers/autocomplete_teacher
with theteacher
parameter set to"B"
. - The first assertion makes sure that the controller rendered the
autocomplete_teacher.rhtml
template. - The next makes sure that the controller returned success.
- the third is a compound assertion that starts by finding the
<ul>
tag with aclass="autocomplete_list"
attribute. (Notice the use ofcss
syntax.)- Within this tag there should be an
<li>
tag with aclass="autocomplete_item"
attribute, - and containing the text
(1) Martin, Bob
.
- Within this tag there should be an
It should not come as any surprise that this test runs in a test environment in which the database has been pre-loaded with very specific data. For example, this test database always has “Bob Martin” being the first row (id=1
) in the Teacher
table.
The assert_select
function is very powerful, and allows you to query large and complex HTML documents with surgical precision. Although this example give you just a glimpse of that power, you should be able to see that the rails testing scheme allows you to test that all the scriptlets in an .rhtml
file are behaving correctly, and are correctly extracting data from the variables set by the controller.
An example using RSpec and Behavior Driven Design.
What follows is a more significant rails example that uses an alternate testing syntax known as Behavior Driven Design (BDD). The tool that accepts this syntax is called RSpec.
Imagine that we have a page that records telephone messages taken from teachers at different schools. Part of that page might have an .rhtml syntax that looks like this:<h1>Message List</h1>
<table id="list">
<tr class="list_header_row">
<th class="list_header">Time</th>
<th class="list_header">Caller</th>
<th class="list_header">School</th>
<th class="list_header">IEP</th>
</tr>
<%time_chooser = TimeChooser.new%>
<% for message in @messages %>
<%cell_class = cycle("list_content_even", "list_content_odd")%>
<tr id="list_content_row">
<td id="time" class="<%=cell_class%>"><%=h(time_chooser.format_time(message.time)) %></td>
<td id="caller" class="<%=cell_class%>"><%=h person_name(message.caller) %></td>
<td id="school" class="<%=cell_class%>"><%=h message.school.name %></td>
<td id="iep" class="<%=cell_class%>"><%=h (message.iep ? "X" : "") %></td>
</tr>
<% end %>
</table>
Clearly each message has a time, caller, school, and some kind of boolean field named “IEP”. We can test this .rhtml
file with the following RSpec specification:
context "Given a request to render message/list with one message the page" do
setup do
m = mock "message"
caller = mock "person",:null_object=>true
school = mock "school"
m.should_receive(:school).and_return(school)
m.should_receive(:time).and_return(Time.parse("1/1/06"))
m.should_receive(:caller).any_number_of_times.and_return(caller)
m.should_receive(:iep).and_return(true)
caller.should_receive(:first_name).and_return("Bob")
caller.should_receive(:last_name).and_return("Martin")
school.should_receive(:name).and_return("Jefferson")
assigns[:messages]=[m]
assigns[:message_pages] = mock "message_pages", :null_object=>true
render 'message/list'
end
specify "should show the time" do
response.should_have_tag :td, :content=>"12:00 AM 1/1", :attributes=>{:id=>"time"}
end
specify "should show caller first and last name" do
response.should_have_tag :td, :content=>"Bob Martin", :attributes=>{:id=>"caller"}
end
specify "should show school name" do
response.should_have_tag :td, :content=>"Jefferson", :attributes=>{:id=>"school"}
end
specify "should show the IEP field" do
response.should_have_tag :td, :content=>"X",:attributes=>{:id=>"iep"}
end
end
I’m not going to explain the setup
function containing all that mock stuff you see at the start. Let me just say that the mocking facilities of RSpec are both powerful and convenient. Actually you shouldn’t have too much trouble understanding the setup
if you try; but understanding it is not essential for this example. The interesting testing is in the specify
blocks.
You shouldn’t have too much trouble reading the specify
blocks. You can understand all of them if you understand the first. Here is what it does:
- The first spec ensures that
<td id="time">12:00 AM 1/1</td>
exists in the HTML document. This is not a string compare. Rather it is a semantic equivalence. Whitespace, and other attributes and complications are ignored. This spec will pass as long as there is atd
tag with the appropriate id and contents.
HTML Testing Discipline and Strategy
One of the reasons that GUI testing has been so problematic in the .jsp
world is that the java scriptlets in those files often reach out into the overall application domain and touch code that ties them to the web container and the application server. For example, if you make a call from a .jsp
page to a database gateway, or an entity bean, or some other structure that is tied to the database; then in order to test the .jsp
you have to have the full enabling context running. Rails gets away with this because the enabling context is lightweight, portable, and disconnected from the web container, and the live database. Even so, rails applications are not always as decoupled as they should be.
In Rails, Java, or any other web context, the discipline should be to make sure that none of the scriptlets in the .jsp
, .rhtml
, etc. files know anything at all about the rest of the application. Rather, the controller code should load up data into simple objects and pass them to the scriptlets (typically in the attributes
field of the HttpServletRequest
object or its equivalent). The scriptlets can fiddle with the format of this data (e.g. data formats, money formats, etc.) but should not do any calculation, querying, or other business rule or database processing. Nor should the scriptlets navigate through the model objects or entities. Rather the controller should do all the navigating, gathering, and calculating and present the data to the scriptlets in a nice little package.
If you follow this simple design discipline, then your web pages can be generated completely outside of the web environment, and your tests can parse and inspect the html in a simple and friendly environment.
Conclusion
I’ll have more to say about RSpec in a future blog. BDD is an exciting twist on the syntax of testing, that has an effect far greater than the simple syntax shift would imply.
I hope this article has convinced you that the rails community has solved the problem of testing HTML generation. This solution can be easily extrapolated back to Java and .NET as future blogs in this series will show.
Clearly the problem of testing Javascript, and the ever more complex issues of Web2.0 and GTK are not addressed by this scheme. However, there are solutions for Javascript that we will investigate in future blogs in this series.
Finally, this technique does not test the integration and workflow of a whole application. Again, those are topics for later blogs.
I hope this kickoff blog has been informative. If you have a comment, question, or even a rant, please don’t hesitate to add a comment to this blog.
Money Format WTF 23
The reason the DailyWTF is so funny, is that we all secretly identify with it. Here’s my latest WTF.
It happened last night at about 7pm. My wife wanted me to run to the store with her. I wanted to get my tests to pass so I could check in my code. I knew that if I left the code checked out until morning, one of my compatriots would wake up at 3am and change something, and I’d have to do a merge. I hate doing merges!
I needed to write thetoString()
method for my Money
object. I had my tests already. Here they are:
public void testToString() throws Exception {
assertEquals("$3.50", new Money(350).toString());
assertEquals("$75.02", new Money(7502).toString());
assertEquals("$0.01", new Money(1).toString());
assertEquals("$0.00", new Money(0).toString());
}
I quickly wrote the function I knew would work:
public String toString() {
return String.format("$%d.%02d",pennies/100, pennies%100);
}
What can I say. I’m an old C programmer. When the format method showed up in Java 5 I jumped for joy.
Even as I typed this code, something was nagging at the back of my brain. Something was telling me there was a better way. But then, I was interrupted by a huge disappointment.
It didn’t compile. Damn! I forgot I was writing in a Java 1.4 environment. No String.format
!
What to do? What do to?
(Wife: “Bob, are you ready to leave yet? It’s getting late! The store is going to close!)
uncleBob.changeMode(CODE_MONKEY);
I wrote this:
public String toString() {
int cents = pennies % 100;
int dollars = pennies / 100;
return "$" + dollars + "." + ((cents < 10) ? "0" : "") + cents;
}
The tests passed, and I checked in my code and went to the store with my lovely wife.
——-
This morning I woke up, finished reading a book on Quantum Mechanics, read a few blogs, and in general pursued my joyous life of study and work. But somethign was nagging at the back of my brain. Something told me to look in the Java Docs for NumberFormat
.
(sigh). On one screen was the code my monkey brain had written last night. On the other screen was the JavaDoc for NumberFormat. (sigh).
So I sheepishly changed my code to:
public String toString() {
NumberFormat nf = NumberFormat.getCurrencyInstance();
return nf.format(pennies/100.0);
}
Of course I know Beck’s rule: Never let the sun set on bad code.
I really need to find that code monkey and kill it.
Web Death by Strings 21
Communication between web clients and servers is dominated by strings. This leads to complex and horrific problems of coupling, and fragility. Where are the rules?
I am in the enviable position of working on two web systems at the same time. One is a ruby-on-rails system for tracking substitute teachers. The other is a JEE system for managing the contents of a library. The point-counter-point of this happy coincindence has illuminated something that has tickled my subconscious for years. The world of Web programming is a world of pathological string manipulation.
Take, for instance, the library system I am working on. One of the pages in this system manages the books in the library by their ISBN, and by their copy ids. Let’s say we had 3 copies of ISBN 0131857258. The page would have a table row for the ISBN that contained a check box for each of the three copies. If the user checks the checkbox, the copy will be deleted from the library. Another checkbox in that row is named “Delete all”. When the user clicks that check box, all the other check boxes in that row are automatically checked, and all copies of that book are eliminated.
Now, think about this from an HTML point of view. How does the server know which copies should be deleted? That’s easy, the server builds the HTML for the page, so it simply gives a special name to each checkbox. When the form is submitted the names of the checked checkboxes are sent back to the server. So all the server has to do is to give each checkbox a name that identifies the copy it represents. We chose a syntax similar to: “delete_432”, which would be the name of the checkbox that represents the deletion of the copy whose id is 432.
Notice the string manipulation? We have encoded server side information in a string that is sent to the client, and we expect that information to come back to the server unchanged. While this makes perfect sense, any good software designer should feel a bit queasy about it. Depending on strings to encode information like this feels just a little bit reckless. It’s manageable, but it’s icky.
Today that ickiness got a lot worse for me. Dean Wampler is working with me on the library project. He was working on the JavaScript to make the “delete all” checkbox work. Now copy ids are globally unique. No two copies, regardless of ISBN, share the same copy id. So when the ‘delete_nnn” comes back to the server, the server does not need to know which ISBN the book belongs to. It just happily deletes copy ‘nnn’. However, Dean needed get his client side JavaScript to set only those checkboxes that corresond to the ISBN of the ‘delete all’ button. The client does not know which copies correspond to which ISBNs. To solve this problem he changed the format of the checkbox name to ‘delete_ssss_nnnn’ where ssss is the ISBN, and nnnn is the copy id. This allowed him to write the JavaScript to look for all the delete buttons that corresponded to the appropriate ISBN.
Of course when he made that change, he broke my server code which was looking for ‘delete_nnnn’. Fortunately I had unit tests that detected the problem instantly. (I truly pity those poor programmers whose only means to stumble accross errors like this is to deploy the system to test and work through the pages manually!) This would have been easy for me to repair on the server side; and I was tempted to do so, simply in the name of efficiency; but my conscience wouldn’t let me.
Why should a client-side JavaScript issue have any impact on the server code? Answer: It shouldn’t!. This is software design 101. Don’t couple different domains!
So I talked it over with Dean and we quickly realized that he could change the JavaScript to use the the ‘id’ attribute of the checkbox tag. The server would construct the page with the id’s set correctly, and the checkboxes would retain their normal name of ‘delete_nnn’.
There is a general rule here somewhere. It’s something like: use names to communicate with the server, and use ‘id’ attributes to communicate with the client. Or, rather, don’t break server code to make client side javascript work.
I’ve had similar string issues with the ‘Substitute’ system I’ve been working on in Rails. In this case I am using Ajax to allow users to type the names of substitute teachers and quickly pop up a list of possible teachers. So if you type “B” into the “Substitute” field, you quickly see a menu of all substitues whose name begins with “B”. As you type more letters the list gets smaller. You can pick a name from the list when it’s convenient for you.
This works great, but has one gaping flaw. The server is looking these names up using SQL statements and is then populating the list in a convenient format. So, for example, it will put “Bob Martin” into the popup list, constructing the name from the first_name and last_name fields of the Substitute record. It is this constructed name that comes back to the server in the form when the submit button is pressed. But the constructed name is not the key of the Substitute record! So how does the server know which substitute has been selected? It could break apart the string “Bob Martin” into “Bob” and “Martin” and then do a query against first_name and last_name, but I hope you share my disgust with that solution! Not only is it inefficient, there are just loads of opportunities for error and fragility. (Just think of honorifics, suffixes, prefixes, middle names, etc.)
My solution, which I dislike almost as much, is to encode the id of the substitute along with the name. So the string that actually pops up in the menu is “(384) Bob Martin”. OK, OK, I know this is bad, and I intend to fix it once I learn how to get the JavaScript that pops up the menu to load a hidden field. But I don’t know how to do that yet, and I am agahst that I need to learn it! It seems to me that being able couple a pretty name to an unambiguous ID is such a common thing to do that I would not have to resort to the deep mysticism of javascript to achieve it.
Ah well, the web is hell. That’s all I can really say about this. Web programming is probably the worst programming environment I have ever worked in; and I’ve worked in a lot of programmign environments. Not only is it flogged by commercial hype that tries to make it seem much more complicated than it is; but it’s so poorly conceived, and so sloppily put together that it is, frankly, embarrasing.