<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Object Mentor Blog: Dependency Management: HtmlUnit</title>
    <link>http://blog.objectmentor.com/articles/2007/02/11/dependency-management-httpunit</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description></description>
    <item>
      <title>Dependency Management: HtmlUnit</title>
      <description>&lt;p&gt;If you are planning on building an &lt;span class="caps"&gt;API&lt;/span&gt;, please, please, think about dependency management.  Don&amp;#8217;t make me know more about your world view than necessary.  Consider what happened to me as I explored HtmlUnit&amp;#8230;&lt;/p&gt;


	&lt;p&gt;I&amp;#8217;m using HtmlUnit to parse and interpret &lt;span class="caps"&gt;HTML&lt;/span&gt; web pages.  I&amp;#8217;ve been very impressed with this library so far.  And I appreciate the hard work and dedication of people who give their software away for free.  So, although this blog is a complaint, it should not be misconstrued into anything more than constructive criticism.  Besides, what I am complaining about here is so universal that it really wouldn&amp;#8217;t matter whose software I chose to scrutinize.  The HtmlUnit authors just got lucky in this case.&lt;/p&gt;


What I want to do with HtmlUnit is quite simple.  Given a string containing &lt;span class="caps"&gt;HTML&lt;/span&gt;, I&amp;#8217;d like to query that &lt;span class="caps"&gt;HTML&lt;/span&gt; for certain tags and attributes.  For example, I&amp;#8217;d like to do this:
&lt;pre&gt;&lt;code&gt;
    HtmlPage page = HTMLParser.parse(htmlString);
    HtmlElement html = page.getDocumentElement();
    HtmlElement listForm = html.getHtmlElementById("list_form");
    assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));&lt;/code&gt;&lt;/pre&gt;
Sweet, simple, uncomplicated.  Just create the &lt;span class="caps"&gt;DOM&lt;/span&gt; from an &lt;span class="caps"&gt;HTML&lt;/span&gt; String, and then query that &lt;span class="caps"&gt;DOM&lt;/span&gt;. 

Unfortunately, HtmlUnit does not appear to be that simple.  What you have to do instead looks like this:
&lt;pre&gt;&lt;code&gt;    StringWebResponse stringWebResponse = new StringWebResponse(htmlString);
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = HTMLParser.parse(stringWebResponse, new TopLevelWindow("", webClient));
    HtmlElement html = page.getDocumentElement();
    HtmlElement listForm = html.getHtmlElementById("list_form");
    assertEquals("/Library/books/manage.do", listForm.getAttributeValue("Action"));&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;The extra &lt;em&gt;stuff&lt;/em&gt; in here is apparently due to the fact that the authors wanted to be able to simulate browsers, frames, and javascript.  I think their goal was laudable.  However, I wish they had done this without forcing those frames, browsers, and script engines down my throat.&lt;/p&gt;


	&lt;p&gt;Given my simple needs, why do I care about WebClient and Window.  Why do I have to turn off the javascript engine?  It may seem a small thing, but it bothers me nonetheless.  It&amp;#8217;s the principle of the matter that gets under my skin.  The pragmatic programmers called it &lt;em&gt;The Principle of Least Surprise&lt;/em&gt;.  I call it, simply, &lt;em&gt;dependency management&lt;/em&gt;.  &lt;em&gt;Don&amp;#8217;t make people depend on more than they need.&lt;/em&gt;&lt;/p&gt;


	&lt;p&gt;The cost, to me, was an hour of rooting around in the documentation, example code, and my own trial-and-error experiments.  (The benefit to me was another blog topic ;-)  That cost may not seem great; but it must be paid again and again by everyone who wants to use the package in a way that doesn&amp;#8217;t quite fit the authors&amp;#8217; world view.&lt;/p&gt;


	&lt;p&gt;There may, in fact, be a simpler way to do what I want to do with HtmlUnit.  If there is, I haven&amp;#8217;t been able to find it, and I&amp;#8217;d be grateful if anyone out there, including the authors, could guide me in the right direction.&lt;/p&gt;</description>
      <pubDate>Sun, 11 Feb 2007 12:55:14 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:30d33c2e-6b0e-4c4c-8f17-e8664251611b</guid>
      <author>Uncle Bob</author>
      <link>http://blog.objectmentor.com/articles/2007/02/11/dependency-management-httpunit</link>
      <category>Uncle Bob's Blatherings</category>
      <trackback:ping>http://blog.objectmentor.com/articles/trackback/159</trackback:ping>
    </item>
    <item>
      <title>"Dependency Management: HtmlUnit" by Fletch</title>
      <description>&lt;p&gt;Thanks for posting this.  It saved me a lot of trouble.  Some things are a pain in the ass with HtmlUnit, but generally it&amp;#8217;s fantastic for web testing and automation.&lt;/p&gt;</description>
      <pubDate>Tue, 15 Dec 2009 18:19:50 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:f37b9bdb-b58c-4fba-aff9-f8c2a3e8efc5</guid>
      <link>http://blog.objectmentor.com/articles/2007/02/11/dependency-management-httpunit#comment-5703</link>
    </item>
    <item>
      <title>"Dependency Management: HtmlUnit" by dtolbert</title>
      <description>&lt;p&gt;I can&amp;#8217;t thank you enough, you saved me a couple hours of bumbling around with HtmlUnit.  I&amp;#8217;ve ran into quite an issue involving a Javascript routine that returns a bit of JSON that I can play a bit with to decode into Html.  I then wanted to take that Html and create an HtmlPage out if, which I would then in turn parse.&lt;/p&gt;


	&lt;p&gt;I think I was on the right path.  What I believe I was doing wrong was using my existing WebClient object to create the HtmlPage with a StringWebResponse.&lt;/p&gt;


	&lt;p&gt;I can&amp;#8217;t get enough praise to the HtmlUnit library. It truely is a gem and &amp;#8220;just works&amp;#8221; in most cases.&lt;/p&gt;</description>
      <pubDate>Mon, 28 Apr 2008 00:04:19 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:fe2418ce-25f2-41a8-a5c3-bea9415aa41c</guid>
      <link>http://blog.objectmentor.com/articles/2007/02/11/dependency-management-httpunit#comment-1726</link>
    </item>
    <item>
      <title>"Dependency Management: HtmlUnit" by Paul King</title>
      <description>&lt;p&gt;HtmlUnit is streamlined for accessing sites (perhaps the String case is not so well handled). Here is the normal thing you would do &amp;#8211; coded in Groovy:&lt;/p&gt;


&lt;pre&gt;
import com.gargoylesoftware.htmlunit.WebClient

def webClient = new WebClient()
def page = webClient.getPage(some_url)
def listForm = page.getFormByName('list_form')
assert '/Library/books/manage.do' == listForm.getAttributeValue("Action")
&lt;/pre&gt;</description>
      <pubDate>Sun, 11 Feb 2007 19:08:01 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:d496fa3a-dfdf-4f83-99fb-6817ade07dda</guid>
      <link>http://blog.objectmentor.com/articles/2007/02/11/dependency-management-httpunit#comment-100</link>
    </item>
  </channel>
</rss>
