Dependency Management: HtmlUnit 9

Posted by Uncle Bob Sun, 11 Feb 2007 18:55:14 GMT

If you are planning on building an API, please, please, think about dependency management. Don’t make me know more about your world view than necessary. Consider what happened to me as I explored HtmlUnit…

I’m using HtmlUnit to parse and interpret HTML web pages. I’ve been very impressed with this library so far. And I appreciate the hard work and dedication of people who give their software away for free. So, although this blog is a complaint, it should not be misconstrued into anything more than constructive criticism. Besides, what I am complaining about here is so universal that it really wouldn’t matter whose software I chose to scrutinize. The HtmlUnit authors just got lucky in this case.

What I want to do with HtmlUnit is quite simple. Given a string containing HTML, I’d like to query that HTML for certain tags and attributes. For example, I’d like to do this:

    HtmlPage page = HTMLParser.parse(htmlString);
    HtmlElement html = page.getDocumentElement();
    HtmlElement listForm = html.getHtmlElementById("list_form");
    assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));
Sweet, simple, uncomplicated. Just create the DOM from an HTML String, and then query that DOM. Unfortunately, HtmlUnit does not appear to be that simple. What you have to do instead looks like this:
    StringWebResponse stringWebResponse = new StringWebResponse(htmlString);
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = HTMLParser.parse(stringWebResponse, new TopLevelWindow("", webClient));
    HtmlElement html = page.getDocumentElement();
    HtmlElement listForm = html.getHtmlElementById("list_form");
    assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));

The extra stuff in here is apparently due to the fact that the authors wanted to be able to simulate browsers, frames, and javascript. I think their goal was laudable. However, I wish they had done this without forcing those frames, browsers, and script engines down my throat.

Given my simple needs, why do I care about WebClient and Window. Why do I have to turn off the javascript engine? It may seem a small thing, but it bothers me nonetheless. It’s the principle of the matter that gets under my skin. The pragmatic programmers called it The Principle of Least Surprise. I call it, simply, dependency management. Don’t make people depend on more than they need.

The cost, to me, was an hour of rooting around in the documentation, example code, and my own trial-and-error experiments. (The benefit to me was another blog topic ;-) That cost may not seem great; but it must be paid again and again by everyone who wants to use the package in a way that doesn’t quite fit the authors’ world view.

There may, in fact, be a simpler way to do what I want to do with HtmlUnit. If there is, I haven’t been able to find it, and I’d be grateful if anyone out there, including the authors, could guide me in the right direction.

Trackbacks

Use the following link to trackback from your own site:
http://blog.objectmentor.com/articles/trackback/159

Comments

Leave a response

  1. Avatar
    Paul King about 6 hours later:

    HtmlUnit is streamlined for accessing sites (perhaps the String case is not so well handled). Here is the normal thing you would do – coded in Groovy:

    import com.gargoylesoftware.htmlunit.WebClient
    
    def webClient = new WebClient()
    def page = webClient.getPage(some_url)
    def listForm = page.getFormByName('list_form')
    assert '/Library/books/manage.do' == listForm.getAttributeValue("Action")
    
  2. Avatar
    dtolbert about 1 year later:

    I can’t thank you enough, you saved me a couple hours of bumbling around with HtmlUnit. I’ve ran into quite an issue involving a Javascript routine that returns a bit of JSON that I can play a bit with to decode into Html. I then wanted to take that Html and create an HtmlPage out if, which I would then in turn parse.

    I think I was on the right path. What I believe I was doing wrong was using my existing WebClient object to create the HtmlPage with a StringWebResponse.

    I can’t get enough praise to the HtmlUnit library. It truely is a gem and “just works” in most cases.

  3. Avatar
    Fletch over 3 years later:

    Thanks for posting this. It saved me a lot of trouble. Some things are a pain in the ass with HtmlUnit, but generally it’s fantastic for web testing and automation.

  4. Avatar
    uselectit.com over 3 years later:

    I think the html unit has been very successful so far and internet operators all over the world are very grateful to these service providers who are providing the software that they have hardly developed all over the years for free. The web client and the window and the java script causes problem for some. Anyway as everything has some sort of disadvantages this software may also have them but the point we have to note here is that how many people are benefiting from this software. I think it definitely needs its admiration. It definitely deserves it! Isn’t it?

  5. Avatar
    sohbet over 3 years later:

    The web client and the window and the java script causes problem for some. Anyway as everything has some sort of disadvantages this software may also have them but the point we have to note here is that how many people are

  6. Avatar
    Convert youtube to mp3 over 3 years later:

    Really glad I came across this

  7. Avatar
    cheap vps over 4 years later:

    Anyway as everything has some sort of disadvantages this software may also have them but the point we have to note here is that how many people arecheap VPS

  8. Avatar
    Chantel Buchbinder over 4 years later:

    I like the style of your website, it is beautiful, people feel very free

  9. Avatar
    Ray Cruz over 4 years later:

    At the very start, I’d prefer to say thanks to you for this informative article. Second, I’d prefer to interrogate wherever I can find greater data concerning your article. I arrived right here through Ask and i can not find any other corresponding web internet sites connected to this matter. How do I subscribe for your web blog? I’d prefer to bind to your updates as they arrive along! I had a query to interrogate but I forgotten what it absolutely was… anyways, thank you. Author of how to cook beef tenderloin

    Best wishes, Ray Cruz
Comments