Wednesday, December 30, 2009

A Couple Of GroovyConsole Jiras

I find myself relying more and more on the GroovyConsole for testing out Java and Groovy stuff (especially regular expressions and xml processing), for writing scripts it basically is my IDE. There's a few irritations for which I opened a couple of Jiras for today, as I didn't see comments on them. All pretty minor, the first two deal with the open/save dialogs, the third deals with process management and output. We'll see what happens. If they're open to the changes, but don't have the time, a couple might be something even I could do. Anyway, here they are:

Monday, December 21, 2009

Gettin' Groovy with xml validation

So, I wanted to be able to tell learn some things about the validity of a particular xml document I was working with a few days ago. If you use something like XMLSpy or the xml plugin for Notepad++, they will display only the first validation error. So, I wrote a Groovy script to let me control the amount of errors reported, and collect some statistics about the document. I call it ValidateMe. And now, I'm sharing it with you. You can get the script off its Google Code page. As usual, it's MIT licensed, so you can do pretty much whatever you want with it. It's usage is described by running 'groovy validateMe.groovy help'. It's pretty straightforward, about 130 lines or so, but I think it's pretty slick.

As a result of doing this, I also learned that you can have multiple classes in the same .groovy script file, as long as the class with the main method is first.

Wednesday, December 16, 2009

Launching a GroovyConsole Without a cmd window

I run Groovy 1.7 from the .zip files (which doesn't yet have the native launchers built for it), and I love the line numbering, and many other things about it. The one thing that was irritating is that a new cmd window would have to be opened every time I launched the groovyConsole.bat. I now have a workaround. Create a new .vbs file in the bin folder of your groovy with the following contents:
Set WshShell = WScript.CreateObject("WScript.Shell")
obj = WshShell.Run("groovyconsole.bat", 0)
set WshShell = Nothing
You can then put a shortcut to this wherever, and even make it pretty by setting the icon to this. Oh, and make sure you have your GROOVY_HOME set up.

Actually, this can be used to run any batch script in the background, as long as you don't need to be able to let the user pass in arguments. My thanks to Koushik Biswas from Yahoo Answers for the tip.

Friday, December 11, 2009

Which Child Am I?

I don't know if this is worth blogging home about, but I've needed this solution a couple times, and I found myself referencing this blog entry draft, so I'll put it out there and maybe someone else will find it useful. The problem is I have an xml node, and I want to know the order it is (as an index integer) in relation to its siblings. You may have to modify this of course, if the children you are comparing are deeper down than 1.

def CAR_RECORDS = '''
  <car name='HSV Maloo' make='Holden' year='2006'>
    <record type='speed'>Production Pickup Truck with speed of 271kph</record>
  <car name='P50' make='Peel' year='1962'>
    <country>Isle of Man</country>
    <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>
  <car name='Royale' make='Bugatti' year='1931'>
    <record type='price'>Most Valuable Car at $15 million</record>

def records = new XmlSlurper().parseText(CAR_RECORDS)
def record =[2].country

int index = 0
Boolean found = false
record.parent().parent().children().each {
  if (it == record.parent()) {
    found = true
  } else if (!found) {
assert index == 3
return index

Monday, December 7, 2009

Using Groovy from the .zip file

I had always had trouble launching groovysh or groovyConsole from the .zip releases of groovy, and always waited until they released the installer for Windows, never knowing why. There is a simple fix, but one that didn't occur to me right away. The cause is that the GROOVY_HOME variable needs set before startGroovy.bat tries to add it to the classpath, so just add

to startGroovy.bat (anywhere before the classpath gets set), and let the good times roll...
This also, of course, overrides whatever you have set as system or user variables, so you can safely play with other versions from the .zips without needing to change anything (or needing admin rights).

Thursday, December 3, 2009

My Second Wave Bot

Inspired by Piratify, I decided to make a bot that makes everyone talk like Yoda (Yodaspeak as I call it). The bot lives at, and can be added by adding to your wave. The sourcecode is available here. It works, but needs more work to improve its results.

Monday, November 23, 2009

My First Wave Bot

I finally got my Wave invite, and I immediately got interested in coding with it. I recently finished creating a bot for it that allows you to search WorldCat from inside Wave. The main servlet is in Groovy, the profile is in Java. Their tutorial pretty much works, though one of the methods was renamed:
getRobotProfileUrl() should be called getRobotProfilePageUrl() (there's a request filed to fix this). Google's plugin for Eclipse also works very well. JetBrains also has a plugin for IDEA, though I didn't test this. In their sample project SpringSource has a build script that uses AntBuilder, which I modified to use the folder structure that was already created by the Eclipse plugin. I successfully built with this and deployed the app using appcfg.cmd from the SDK. I used straight html for the bot's profile page, but in their sample SpringSource shows how you could use the MarkupBuilder in groovlets.

Anyway, the app is deployed on the Google App Engine here, it's homepage (where the sourcecode is also available) is here. You can add it to your waves by adding For even more bots, check out the list. Mine's listed too. There is another list out there, but it doesn't seem used as much.

Some gotchas:
  1. You cannot test robots without deploying them to the Google App Engine, this makes you waste some of the number of deployments you have on the free account (currently you get 1000).
  2. Wave caches its bots, so you have to change the version in appengine-web.xml (I don't think you have to change the version in capabilities.xml too unless the capabilities have also changed, but I've been changing both to match). They tell you this, but you have to be careful because even though you may have deleted the old version if you reuse a version descriptor text Wave may still try to call the old version and you will get a ServletUnavailable exception and waste another deployment. You have to wait for the re-caching to occur.
  3. If you are using Groovy, you have to upload the groovy-all jar in the war/WEB-INF/lib directory (the war folder can be a different name, but that is the convention used by the Eclipse plugin).
For those who haven't heard about Wave, it's basically a kind of collaborative IM (I once jokingly described it as an MMIM - Massively Multiperson Instant Messaging). But it isn't quite accurate to describe it as a kind of IM. The text is live, you can see it as the person is typing it, but whether its treated as an IM or more like an email is fluid. It depends only on if other people are there at the same time. So, it can be viewed as a kind of mashup between email, IM, and collaborative documents and is more concurrent than traditional email. LifeHacker has some use cases, which make for good propaganda.
Anyone in the conversation (called a Wave) can edit any of the messages (blips). Side conversations can occur in the same stream, these side conversations are called wavelets. APIs exist to have bots in a conversation that automate tasks (such as links for searches, posting the conversation to a blog, bringing in text from a feed, or converting everyone's text to pirate talk) and for gadgets that let you put different kinds content inside the conversation, such as documents, polls, etc. Google plans to open source most of Wave once it's finished, allowing other 'federated' servers to become Wave providers, and they plan to make the protocol they use the predominant protocol on the internet. The Google Wave Federation Protocol is itself built off XMPP (the same protocol Jabber and GoogleTalk use). They have bindings for Python and Java currently (I've heard the Python API is not as polished as the Java one, but I don't know that for sure and I'm sure it will improve), and will probably be adding more languages in the future to support their goal of making their protocol #1. Wikipedia has a pretty good article on Wave as well.

I still have Wave invites left for my friends, if you're interested.

Wednesday, November 11, 2009

Functional Programming

I know this is old news, but it's new to me. I've been thinking about an article by Paul Graham which is basically about how functional languages are superior to OOP and other imperative languages, and languages are shifting towards functional programming in general, and LISP in particular. I disagree on both counts. Ignoring the fact that whether its right or not, OOP isn't going away any time soon, I think that there are certainly problems that are better suited to that paradigm, but there are many problems that are not. This is a good part of the reason that we see more multi-paradigm languages that pure languages.

I specifically object to the example he chose to demonstrate the 'power' of Lisp. He chooses a problem that Lisp and similar languages are naturally going to win. He wants to return a method that is an incrementer as the result. Its going to be ugly in any language where methods are not first class objects.

He also claims that design patterns exist to make up for shortcomings in a language. There is a related question on StackOverflow. The study referenced there states that 7 of the 23 patterns in the GoF still applied in functional languages. Even Lisp is not a purely functional language and needs what can be regarded as a design pattern (Monad) to allow for something stateful (like IO). (When was the last time you wrote a program that didn't use some kind of IO?).

Languages are not becoming more and more like Lisp, though many of them did borrow some concepts from Lisp. One of the main reasons for this is that Lisp was one of the first high level languages (even before C). Yes, things like garbage collection, dynamic typing, and recursion were initially eschewed by the programming community only to be later made mainstream. This is for reasons of computing power rather than a sudden realization that Lisp had it right all along. Paul Prescod talks about how Python isn't moving towards Lisp and how many former Lispers have adopted Python. I think Paul's statements stem from having sour grapes over the lack of success of his pet language more than historical fact.

That being said, Java certainly isn't perfect and has been constrained by Sun's unwillingness to break compatibility to add new features. Wouldn't it be better to say languages like Java could benefit from some additional Lisp-like options, such as lamda (available as closures in groovy) and first class methods (available in Scala) rather than immature statements like Lisp >> Java? While I agree that there might be a place for a language that keeps things simple to help curb mistakes, I would rather see greater flexibility and choice given to empower the programmer, allowing for whatever style best suites the problem at hand. Some problems lend themselves to functional programming, and some problems are stateful in their very nature. I find OOP easier to think in, because we fairly naturally think of things as interactions of systems. Though, this is perhaps overused currently in our industry for problems it shouldn't be used to tackle.
For this reason, I would like to study and work with functional languages more than I currently am to recognize and take advantage of those situations in which functional languages are the right choice (assuming the decision is up to me).

On a related note, the Computer Language Benchmarks Game is also good fun, for those who find religious wars amusing. 99 bottles of beer is also an interesting reference. It is programs that print the lyrics to the song 99 bottles of beer, in many languages. (check out an Erlang version, Lisp version, and a Java version).

Related links

Something unrelated, but funny

Groovy - Sometimes you still need a semicolon

I Was ploughing through my overly-large blogroll the other night when this article caught my eye.
I gave it a read, because I wanted to know all the cases where you had to have a semicolon.

He gives two examples, the first of which he’s wrong about, it works fine in 1.6.5 and I believe it works for anything >= 1.6.
def list = [1,2,3] as List<Integer>
println list

This second one (not the exact same example he used) does need a semicolon
{-> assert true == true }()
{-> assert false == false }()

Should be
{-> assert true == true }();
{-> assert false == false }()

This is only the case if you have two closure calls next to each other with nothing in between.
{-> assert true == true }()
println ""
{-> assert false == false }()
works fine.

Are there other times when you need a semicolon at the end of the line (assuming only one statement per line)?

Friday, November 6, 2009

Groovy 1.7 Beta 2 Available

It’s a bit of old news, but I don’t remember seeing anything about it until today.
It was available 12 Oct, and Windows installer binaries were available 18 Oct. Get yourshere.

A draft of the current features is available at  This release is mostly bug fixes, with two notable improvements.  They have added the ability to alter the meaning of Groovy Truth.  This lets you add some truth to your own class by adding the asBoolean method:

class Foo {
  String value
  boolean asBoolean() { value == "something true" }
assert new Foo(value: "something true")
assert !new Foo(value: "teh cake is a lie")

And of course, using ExpandoMetaClass, you can alter the behavior for Groovy classes that already have a Groovy Truth defined:
Integer.metaClass.asBoolean { int value ->
  return value > 0
Integer foo = 7
assert foo.asBoolean()

They’ve also added stylish outputs for assertion failures (I guess taken from the Spock testing framework):
int foo = 0
int bar = 1
assert foo == bar
will display:
Assertion failed:
assert foo == bar
       |   |  |
       0   |  1

Saturday, October 31, 2009

AHCI in Windows

A recent discovery I made because of my mobo's odd controller:
While its true that AHCI works out of the box with Vista/Windows 7 (that is, there is no need for third party drivers) -- If you did not have it enabled when you first install Windows, it will be disabled to save some boot time. Makes sense. What's odd is what you have to do to enable it. You have to change HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\msahci\Start from 0x3 to 0x0. Otherwise, you get a blue screen when you try to boot. The fancy startup repair that comes with Vista/Windows 7 will be launched the next boot (if you let it), but won't be able to figure this out for you.

So yea, I tried moving the cables around on the inside (thinking it might be a bad sata port), and changed to AHCI, and my cd drive was still the one that disappeared, not the drive in that port.  I thought it might be some incompatibility with the firmware, but the updater fails to run. :'( I suppose it's also possible it's a bad cable.

Friday, October 16, 2009

Java vs Groovy: Polymorphism

While doing some studying for SCJP, I was tinkering with some stuff in the Groovy console as a way of testing some stuff in Java and I found they actually behave differently.

class Person {
  protected String name
  protected int age
  public Person() {
    name = "secret"
    age = -1

class John extends Person {
  int favoriteNumber
  public John() {
    name = "nobody special"
    age = 0
    favoriteNumber = 7
  public String doNothing() {
    return "junk"

Person p = new John()
println p.doNothing()

If you do this in Java (splitting the classes out to their own files of course), it won't compile. Java looks at the reference type for available methods, so you will get a NoSuchMethodException. In Groovy, however, it looks at the type of the object, not the type of the reference so the method is found at runtime. And this is probably what you would want, so I can refer to John or Mary as a generic Person, but they still do the things they do in a John or Mary way.

Java vs Groovy: Overriding static methods

The Java equivalent of below will not compile, but in Groovy it works fine. Groovy supports overriding static methods, whereas Java does not.

class Person {
  protected String name
  protected int age
  public Person() {
    name = "secret"
    age = -1
  public static String doNothing() {
    return "junk"

class John extends Person {
  int favoriteNumber
  public John() {
    name = "nobody special"
    age = 0
    favoriteNumber = 7
  public String doNothing() {
    return "more junk"

Person p = new John()
println p.doNothing()

I don't think this would be a problem. You might want each inherited class to have its own (maybe static) version of a method, but for every instance of the class to only have one in memory. If you don't want the children overriding the method, you should just make them final.

Thursday, October 15, 2009

Skip The Tests

Skipping tests is generally a bad idea. But, if you are changing things just to tinker and see results, without particularly caring about the tests, it is easily skipped from the command line:
mvn install -DskipTests
This is something I'm sure you've seen. What I didn't know is that you can also skip the compiling of the tests:
mvn install -Dmaven.test.skip=true

Code in Blogger

So, I've been looking for a way to make my code examples in blog posts more readable. I had tried to use the hosted version of SyntaxHighlighter, but couldn't seem to get it working right (I'm sure it was something simple) and it needed several lines added to the Blogger html template. Instead, I've opted to use google-code-prettify. Though it doesn't support as many languages (in particular no groovy), all I needed was 2 lines pasted into the template, and making it load the class when the body element is. Couldn't be simpler. I followed the instructions (slightly tweaked) from here. Thanks Luka.

These are the lines you need (anywhere in head tag):
<link href="" type="text/css" rel="stylesheet" />
<script type="text/javascript" src="" />
Then modify body tag to load the script:
<body onload='prettyPrint()'>
<pre class="prettyprint" style="overflow:auto;">
  <!-- Your code goes here -->
You don't need to specify the language since the script will guess, but you can if you like:
<pre class="prettyprint lang-html">
  <!-- HTML code here -->
The lang-* class specifies the language file extensions. File extensions supported by default include "bsh", "c", "cc", "cpp", "cs", "csh", "cyc", "cv", "htm", "html", "java", "js", "m", "mxml", "perl", "pl", "pm", "py", "rb", "sh", "xhtml", "xml", "xsl".

I've gone back and added the syntax highlighting to all my code examples in previous posts.


Recently I needed a Boolean in a groovy project I was working on to alternate back and forth (for row highlighting with Apache POI). I thought it would be slick to metaclass a flip() method (something I've always thought should be there) into Boolean. It turns out, this is not possible. I browsed the Java source and learned that all primitive wrapper classes are immutable. I'm not completely sure why they did this. My guess is to protect programmers from hurting themselves by mutating objects in a collection and possibly creating unexpected behavior or a race condition.  I could have created my own boolean wrapper class, of course, or use Apache Commons MutabeBoolean. In the end, I decided not to be fancy and just reference a new object like

Boolean foo = false
foo = !foo
But boy, it would have been pretty & nice to have
Boolean foo = true

Tuesday, October 6, 2009

Yay! Someone else agrees with me!

A couple days ago, I blogged about some of my frustrations with task estimation. This morning, I saw an article posted on AgileZone that seemed to echo my words (though in a more concise and articulated fashion than I managed to).

Monday, October 5, 2009

Things About This Universe That Amuse Me

We live in a weird universe.
  1. There are more species of beetles than any other order the animal kingdom, comprising 25% of all known life-forms.
  2. Humans are the only species that drinks milk from a species other than their own.
  3. There are significantly less than a googol of atoms in the known universe.
  4. Onomatopoeias
  5. Time is relative.
  6. More people have died than are currently alive (6% of all people who had ever existed were alive in 2002).
  7. The chimpanzee genome is 95% identical to the human genome. And up to 95% of our genome may be junk left behind by retroviruses and evolutionary artifacts.
  8. Humans are the only species known to use the smile to express something other than fear and aggression.
  9. ∞ - ∞ != 0.
  10. Human childbirth. How does this not kill us? The infant's skull actually changes shape as it passes through the birth canal.
  11. Human flora. The womb is sterile, but after the family has finished kissing and caressing, 500 to 1000 kinds of bacteria live in the gut, and just as many on the skin. Cool.

Sunday, October 4, 2009

Project Estimation

I've been thinking about project estimation the last several days. What initially got me thinking about it was the links off of this article. I find the notion of using story points instead of hours for sprint estimates appealing. I like the idea of having a set number of story points per sprint, it keeps everyone apprised of the complexity of the issues at hand. It also encourages the use of stories instead of use cases for requirements, which has some advantages. I buy the idea that estimating story points is easier than estimating time. Scrum-Breakfast likens it to a train
Traditional estimates attempt to answer the question, "how long will it take to develop X?" I could ask you a similar question, "How long does it take to get the nearest train station?  The answer, measured in time, depends on two things, the distance and the speed. Depending on whether I plan to go by car, by foot, by bicycle [...], the answer can vary dramatically. So it is with software development. The productivity of a developer can vary dramatically, both as a function of innate ability and whether the task at hand plays to his strong points, so the time to produce a piece of software can vary dramatically. But the complexity of the problem doesn't depend on the person solving it, just as the distance to the train station does not depend on how I get there.
However, I think this ignores a fundamental aspect of human nature (at least of modern day humans): our need for instant gratification. We don't really care how far away something is (except for concerns about fuel). When we ask someone how far away they live from some particular point, they usually give an answer in hours and minutes, not miles. They view it as a question of how long they have to wait before they get what they want. Similarly, product people and especially the customer don't particularly care how complex a problem is to solve. They just want it solved, and want to know how long they have to wait before they get what they want.

It seems to me that eventually someone will have to make the translation to hours (even if only done implicitly). Users care about release dates -- "when are the features I want going to be available?". Even if you schedule sprints with story points, someone is going to have to figure out how long those will take to complete in order to set a reasonable release date. When you make this conversion, you will have to account for variances in velocity (the speed at which people work). Even if you do decide to use story points, this correlation will be necessary at first to establish how many story points are possible in a sprint.

Still, there are some advantages to this approach that come to mind. When you estimate story points, you don't have to estimate both complexity and your ability to solve the complexity. You only have to worry about the complexity. The hours for big things are often pulled out of the air and aren't very accurate until they are broken down into tasks. Story points provide a good way of looking at the big picture without fleshing out all the details. One additional cool thing about story points is that they have built in variances, as Scrum Breakfast points out
The thing to realize about about estimates is that they are very imprecise. +/- 50%. One technique for dealing with a cost ceiling is to define an estimate such that the actual effort needed will be <= the estimate in 90% of the cases, regardless of whether they are measured in days, hours or point. So Story Points are usually estimated on the Cohn Scale (named for Mike Cohn, who popularized the concept): 0, 1, 2, 3, 5, 8, 13, 20, 40, 100. Why is there no 4? Well a 3 is a 3 +/- 50%, so a three actually extends from 2 to 5, a 5 from 3 to 8, etc. The difference between 3 and 4 is nowhere near as significant and between 1 and 2, so we don't give estimates that give us a false sense of precision. Summed together, many imprecise estimates give a total that is a remarkably accurate estimate of the work to be performed (the errors tend to cancel each other out, rather than accumulate).
Mule also had an interesting approach to creating a product backlog, where they used a sort of bucket sorting rather than choosing actual numbers for story points. This is what Chris Sterling calls affinity estimating.

Mike Cohn says that sprint backlogs and product backlogs should have different units to prevent confusion, since if you use hours for both it doesn't show that hours on a sprint backlog have been thought about a lot more than hours on the product backlog. Speaking of Mike Cohn, he has an interesting notion for the role of story points. He suggests they are a good long-term indicator, but the short term should focus on the product backlog and prioritize stories, then break them into tasks and estimate those using hours. Some have suggested using task points in a similar way as story points, but for individual tasks. This might provide some room for variance if you really suck at estimating. Additionally, you will only have to update the estimates when complexity changes not when velocity changes. This also might be a more lean approach since it doesn't waste time on an artifact that is not needed. However there are some downsides to this approach. One is that there is no easy way to track the progress of the task (e.g. in Jira), unless a conversion to hours is first made. It also presents a challenge to HR, which may wish to account for hours for financial reporting purposes.

I think it makes a bit of sense to use hours to estimate at the task level, since you should be able to give more detail at that point (as opposed to the bigger items, like stories where any hours ascribed would basically be pulled out of thin air). This is what I'm interested in. How can I make my task estimates more accurate to make sure I can deliver what I think I can deliver? To me, it seems the issue with both of these methods of task estimation is that they do not address the real problem with why tasks get off schedule in the first place. Actually, there are two, but both deal with an unknown. The first is if there are other tasks competing for the same resource (e.g. your time) that weren't initially accounted for: either they came up after the estimation or were a result of oversight. The second is because you have misjudged the complexity of a task. You look at a problem, it seems pretty simple, you ascribe a few hours/points/whatever to it, but once you start digging into the problem you realize its going to take longer than you thought. Then you are left with the question of whether or not to re-estimate. I think while the original estimate should be retained for posterity's sake, its useful for the product people to have a new estimate for planning purposes...perhaps pushing back to another sprint.

The only certainty is the certainty of uncertainty, but that's probably why you're doing agile in the first place. This is especially true when new technology is involved. While there are some who advocate task points for breaking down tasks (pieces of a story), most advocate to use hours at this level. And this is what I do not understand. Why are we going to all the bother to make all these estimates about minutia? Get a commitment from developers, keep a backlog, and get to work! I can't believe some agile teams are willing to spend several hours (some say 8 hours) on sprint planning. That doesn't sound agile to me at all. A practice that seems to work pretty well for some of my colleagues here at OCLC is to use units that are a bit fuzzier than an hour. They tend to estimate in 1/2 days, which is a bit less precise, but allows for greater accuracy because it is easier to ballpark. But unless you are continuously updating estimates (as the Scrum Primer suggests) so that volunteers can move around on tasks (perhaps in a paired situation), I don't see much point in having exact hours.

Honestly, I'd rather do away with task estimates altogether. Maybe its because I don't like submitting something that I know I can't do a good job on (at least not yet). But it also seems to me that you are spending time creating an artifact that doesn't necessarily help you finish the sprint. There are some out there who have suggested this. Including Jeff Sutherland here. Jurgen De Smet has an interesting blog post on it here. I feel that as long as I'm reasonably certain I can get tasks X, Y, Z done in sprint N, I don't' really need to make up numbers for how long they'll take. Maybe I'd feel differently if I were on multiple teams and close to being overworked. But I feel developing this sense is developing a skill that isn't really useful for anything. I can't even get good at estimating to apply it to other areas to improve productivity, this 'gut' sense is a rather domain specific intuition. What's the point? Am I completely missing something?

P.S. This didn't really fit with anything else, but I wanted to pass it along. It is about building a common definition of done:

Friday, September 25, 2009

Vaccuum Your Firefox

This has been blogged about in the usual places already, but for those who haven't seen it:
There's an addon for Firefox that defragments its sqlite database, called Vacuum Places. I recently gave this a whirl and was pleased with the results. The responsiveness of my address bar was noticeably improved. Now, bear in mind that I use Firefox a lot. I have over 4000 bookmarks -- its kinda getting out of control. Those who don't use it as much probably won't notice much of a difference, but give it a whirl and see what it does for you. Note that there is a disclaimer saying to backup you profile (easily done with MozBackup), but I haven't had any issues with it.

Some other addons you might want to check out:
Adblock Plus - never see another ad
SkipScreen - auto waits on file upload sites
DownloadHelper - download youtube videos, a page of images at once, and more

DownThemAll! - download everything on a page, based on filters (all pdfs, all images, etc)
Greasmonkey - run custom javascript in website to do all sorts of nifty things (use Greasefire to light up when scripts are available on the current site or get some here)
Stylish - loads custom css into sites (get some here)

Tab Mix Plus - one of the biggest reasons I use it is for the 'duplicate tab' option, but the 'close other tabs', 'close right tabs', and 'close left tabs' are pretty nifty too.

Duct Tape Programmers

Wednesday there Joel blogged about the first chapter of Coder's At Work, where he admired duct tape programmers that were willing to skip unit tests and code quality in order ship on time. I don't really have much to say about it that Uncle Bob and his commenters have said. I agree with just about everything Bob said except for
I found myself annoyed at Joel’s notion that most programmers aren’t smart enough to use templates, design patterns, multi-threading, COM, etc. I don’t think that’s the case. I think that any programmer that’s not smart enough to use tools like that is probably not smart enough to be a programmer period.
In an ideal world, this would be true. And as the industry matures, I think this is becoming increasingly true. However, there are still some programmers out there that aren't very talented and probably shouldn't be programming, and yet they still are. Now I'm not an advocate of making your life difficult to weed out the less talented, but our ever increasing toolbox is lowering the barriers to entry. This isn't necessarily a bad thing. I have a fair amount of faith in the invisible hand and if this allows businesses to get their needs met, then so be it. Just be aware there are people programming that aren't the sharpest knives in the box out there, and they may even be getting away with least for now.

Wednesday, September 23, 2009

Don't Bother Testing for Null?

This appeared on Rainsberger's blog a couple of days ago
Unless I’m publishing an API for general use, I don’t worry about testing method parameters against null. It’s too much code and too many duplicate tests. Besides, I would be testing the wrong thing.
When a method receives null as a parameter, the invoker—and not the receiver—is missing a test
I think I've got to disagree with him on this one. Unless I'm misunderstanding 'general use'. It might be OK not to test for null if it's just your code calling your code (of course what happens when this gets pushed out, maybe serviceized and suddenly others are calling it?) Even if you do not guarantee the right results if the method is passed a null, I would think you should at least make sure that it doesn't blow up. If it fails, it should fail gracefully. It's true that it's not the responsibility of the receiver to make sure the invoker is calling it correctly, and I wouldn't expect a test for every conceivable possibility necessarily. But nulls happen all the time, its not unreasonable to expect a quick check. There wouldn't even be a lot of code. something like
if (someVar == null) {
    Log.error("it was null");

would do nicely. I do agree there is test duplication between the reciever and invoker, in that both are testing for the null condition on the same variable, but this doesn't really bother me. It seems to me there are lots of tests that have overlap or dependencies. Maybe I'm just crazy. I left a comment on the post.

Monday, September 21, 2009

Extending NTFS with Bad Sectors

I wanted to grow an NTFS partition to use unallocated space this weekend on an XP machine. If it were Vista or Win7, I would have used the built in resizing ability in disk management, and I found I could not do so with my usual tool of choice (GParted Live CD). The error message said that it could not complete the operation because of two bad sectors, and to run chkdsk (which I did) and then use ntfsresize with the --bad-sectors option. When I tried this, it said it couldn't grow it unless I make it bigger with fdisk. The only way I know how to do this would be to create an entirely new partition (which would mean all the hassle of reinstalling windows and the needed apps). I was finally able to do it with EASEUS Partition Master Personal. Unfortunately, it is not open source, but it is free and it did what I was trying to do without restarting in minutes.

Friday, September 18, 2009

Types of Reviews

An interesting discussion was started in this month's section meeting about code reviews. People brought up the point that catching flaws in design implementation requires knowledge of the requirements and that not having such knowledge allows the reviewer to only look for bad programming practices in general. Of course, I'm new to all this, but it seemed logical to me to have three kinds of reviews.

- Design Review

- Implementation Review
- Code Review

The purpose of a design review would be to review the design of the system. This is a higher level view and shouldn't involve any code. The purpose of an implementation review is to review how well the code meets the requirements (somewhat similar to acceptance testing), this test is probably the most time intensive of the three. A code review is a review of the code itself (perhaps with some context, but not at the level of an implementation review. This should focus on errors in logic, and adherence to accepted coding standards (not style). Am I missing something that should be reviewed?

I think when I prepared code for review, it helped when I attached an overview of the classes and what their purpose and general relation to each other was. If a class is very large (probably needs refactored), it may also help to attach an overview of it (at least for the methods most important to complete its job).

It was also pointed out that having people read over requirements and code would take a fair amount of time, which is time not spent on other tasks. An unanswered question was whether pair programming should complement standard reviews or replace them altogether, and how time could be used most efficiently. Everyone agreed that in some way the feedback loop needed to be shortened, whether that be through pair programming or reviewing more often over smaller amounts of code or some other means.

Monday, September 14, 2009

Why Cowon is Awesome

The COWON S9 is infinitely more awesome than the iPod and the Zune. Here are a few reasons why: AMOLED touch display, video playback, FLAC and OGG compatibility (though unfortunately no apple lossless), a completely customizable interface (flash/actionscript), Themes and more themes, games and apps, drag and drop file syncing (no iTunes or special software required), highly customizable equalizer. And they regularly provide free updates. On August 17, they updated the firmware with even more features (previous update was July 24). It also has a microphone and a radio. I also hear they are working on porting rockbox to COWON.

I was a bit nervous when I purchased this, I carefully read reviews and so forth about it, but its a Korean brand I had not heard of before. I've owned it for about three months now and I have absolutely no regrets. It doesn't have quite the storage capacity of iPod or Zune (30GB), but don't think I could fit my entire collection on them anyway. I keep my music in three different places on my computer, some mp3, some flac, some ogg. With my S9, I can just drag and drop whatever in there and it maintains the same folder structure so you can easily find things (ever try making sense of the organization iPod uses?). I'm using my music collection in ways that were simply not possible with my old iPod. If your iPod or Zune has recently bit the dust, I highly recommend.

The Not So Answered Variables in GPathResult

A while ago, I posted that I had gotten the answer on how to get the number of records in an xml file using XMLSlurper (or XMLParser) by passing a variable. It turns out, it was not so answered. This only worked for elements that are immediate children of the root. It seems XMLSlurper and XMLParser can't get longer paths from a single String, it needs to be broken up. (Not sure why it was implemented this way). Fortunately, our own brilliant, Jonathan Baker noticed this and suggested the solution below. It's not complicated once you figure out that the String has to be broken up:

def xmltxt = """
def xml = new XmlSlurper().parseText(xmltxt)
def fullPath = 'file.something.record'

def pathElements = fullPath.tokenize('.')
pathElements -=
def root = xml
pathElements.each {node ->
    root = root."$node"
return root.size()

You could also use split, as John Wagenleitner suggested in response to my comment on his answer on StackOverflow:
def xmltxt = """
    <record name="some record" />
    <record name="some other record" />
def xml = new XmlSlurper().parseText(xmltxt)
String foo = "something.record"

def aNode = xml
foo.split("\\.").each {
  aNode = aNode."${it}"
return aNode.size()

My thanks to you both.

After talking with Josh, I understand why the Groovy people had to do what they did. Dots (.) are still legal in XML element names, as long as the name doesn't start with a dot. (Actually, other punctuation is allowed as well, though it is not recommended). I don't think they could use forward slashes as XPath does because of how Groovy has overloaded the operators and dots had to be allowable, so this is what we're stuck with. Maybe they should have used spaces instead, since those aren't legal in element names. I don't know the impact this would have on other classes.

Friday, September 11, 2009

So, Why The Name?

Thought I would give credit where credit is due. I'm not a terribly creative person, if someone else has something creative I can use, I'll most likely rip it off. 'Witty Keegan' is one of the nicknames bestowed upon me by my friend and college roommate, Vitus Pelsey. ('Keegasaurus' being one of the others). I thought a pun would make for a good blog title, while giving a false impression of witticism and intelligence.

Bad GString! Bad! --- Wait, My Bad

I was trying to write a test for a method that inserted some stuff into a StringBuilder that was modified as a GString and was frustrated to find that my stubbed method was never called!

Initially I got mad, blamed Groovy for making my life more difficult with its ridiculous automagical boxing. Then Josh pointed out that StringBuilder is actually a CharSequence. It's not a Groovy thing, I'm just dumb. Fire up your GroovyConsole and observe:

StringBuilder.metaClass.insert = {int arg0, Object arg1 ->
  println "FOOO"
StringBuilder.metaClass.insert = {int arg0, String arg1 ->
  println "BARR"
StringBuilder.metaClass.insert = {int arg0, CharSequence arg1 ->
  println "I'M HERE!!!"

StringBuilder sb = new StringBuilder()
def foo = "ROGER"
def bar = "$foo"
sb.insert 0, bar
The result is "I'M HERE!!!". All I had to do was stub the right method.

If you're experiencing String vs GString issues, like I thought I was, you may find this helpful.

Integration Tests Are a Scam?

I recently listened to a talk given by J. B. Rainsberger (author of JUnit Recipes) with the title Integration Tests Are a Scam (summary notes here). If the idea seems crazy, blame the fact that he's from Canada ;) These are some quick thoughts I had, I may expand on them later.

Here's some definitions he gives:
Basic Correctness
"Given the myth of perfect technology, do we compute the right answer?"
Myth of Perfect Technology
"Assuming we can use an arbitrary large amount of memory, for an arbitrary amount of time, on a Turing machine for spherical people[...]"
Integration Tests
"...any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior."
"You should never need to write an integration test to show basic correctness." He believes our largest problems lie in basic correctness. After we get this right, then we can worry about issues of performance, security, etc. The question of basic correctness is where he focuses his efforts. (He paraphrases a quote I believe based on the Pareto Principle).

Downsides of integration testing:
  • Intergration tests are slow
  • Integration tests don't tell you where the failure occurred (may be difficult to find even with debugger, assuming TDD hasn't caused you to forget how to use one)
  • In order to have enough tests at the integration level to test thoroughly, the number of tests that need to be written increases combinatorially, based on code paths
  • There is a lot of duplication in test setup

Now, it should be noted that he is not talking about acceptance tests. He says that acceptance tests tend to be end-to-end, and that is OK. But end-to-end tests should not be used for developer tests. He is also not altogether against integration tests for finding bugs, he just doesn't want them permanently added to the project. Bugs found through an integration test should create new object tests. "I don’t doubt the necessity of integration tests. I depend on them to solve difficult system-level problems. By contrast, I routinely see teams using them to detect unexpected consequences, and I don’t think we need them for that purpose. I prefer to use them to confirm an uneasy feeling that an unintended consequence lurks."

Instead, he recommends 'collaboration tests' (commonly called 'interaction tests') and 'contract tests'. By collaboration tests, he means to stubbing out or mocking the collaborators to isolate functionality and make sure all the ways it can interact with collaborators behave as expected. This is 1/2 of the work (and actually the easier 1/2). You've checked if you've asked the right questions and able provide an answer for all the responses.
The missing piece (that commonly causes people to rely on integration tests) is a misunderstanding between the interaction of piece in question and its collaborators.

The second 1/2 is 'contract tests'. The first of the two checks on the other side of the interface is whether the collaborator able to provide a response when "the star" (Class in Test CIT) asks for it (is it implemented? can it handle the request in the first place?). The second is whether the the collaborator responds in the way the CIT is expecting. "A contract test is a test that verifies whether the implementation respects the contract of the interface it implements." There should be a contract test for every case we send the collaborator and every case the collaborator might send back. Again this will using stubbing and mocking. The advantage of this approach is that you know when you have enough tests (two for each behavior). I've tried to diagram the idea thusly:

He claims that if you ask these questions between every two services and focus on basic correctness, we can be "arbitrarily confident" in the correctness. The number of tests increases additively instead of combinatorially and is easier to maintain, with less duplication, and faster to run. If something goes wrong, you are either missing a collaboration test or missing a contract test or the tests do not agree. This makes troubleshooting easier. As of yet, there is no automated way of testing that every collaboration test has a matching contract test.

When I saw the title of the talk, I initially reacted rather violently against the notion. I'm still not sure if I'm 100% behind it, but I think there are some good points raised about integration tests and their utility. However, as Dan Fabulich points out in a reply to a response Rainsberger gave to a comment about a Mars rover failure, figuring out that you are missing a test may not come easily.
"The ability to notice things" is high magic. If you have that, you can find/fix any bug without any tests... why don't we all just "notice" our mistakes when writing production code? In this case you're just using intuition to notice a missing test, but that's no easier than noticing bugs.

As you know, I share your view that integration tests are tricky, in the sense that writing one tempts you into writing two, where instead you should be writing more isolated unit tests. But unit tests have the opposite problem: once you have some unit tests, it's too easy to assume that no more testing is necessary, because your unit tests have covered everything. By exaggerating the power of unit tests and the weakness of integration tests, you may be doing more harm than good.

Imagine you're actually coding this. You just finished writing testDetachingWhileLanded and testDetachingWhileNotLanded. (It was at this point in your story that you first began to "notice" that a test was missing.) You go back over the code and find you have 100% branch coverage of the example application. Your unit tests LOOK good enough, to a superficial eye, to an ordinary mortal. But you're still missing a critical test. How are you supposed to just "notice" this?

More generally, how are you supposed to build a habit or process that notices missing tests *in general*?

I've got just the habit: write all the unit tests you can think of, and then, if you're not sure you've got enough unit tests, do an integration test. You don't even necessarily have to automate it; just try it out once, in real life, to see if it works. If your code doesn't work, that will help you find more unit tests to write. If it does work, don't integration-test every path; you were just double-checking the quality of your unit tests, after all."
While I wouldn't go so far as calling it 'magic', finding all the edge cases can be difficult and may require a fair amount of knowledge about the collaborator. Rainsberger later commented that his method of ensuring every condition is tested is
Every time I stub a method, I say, "I have to write a test that expects the return value I've just stubbed." I use only basic logic there: if A depends on B returning x, then I have to know that B can return x, so I have to write a test for that.

Every time I mock a method, I say, "I have to write a test that tries to invoke that method with the parameters I just expected." Again, I use only basic logic there: if A causes B to invoke c(d, e, f) then I have to know that I've tested what happens when B invokes c(d, e, f), so I have to write a test for that.

Dan Fabulich suggests adding either "Every time I stub a method that can raise an exception, I have to stub it again with a test that expects the exception" or "Every time I stub a method to return X, I also have to write a test where the stub returns Y. And Z. For all possible return values of the method." Of course, it's impossible (or at least very difficult) to be sure you've gotten all edge cases.

My takeaway from all this is that integration tests are overused, often perhaps as a half-baked attempt to remedy poor unit tests (even though the two different tests try to solve different problems). While I'm not quite ready to do away with integration tests entirely (I think they provide a useful documentation of examples of use without going into the nitty gritty details of a unit test and make a nice supplement to unit tests), I think one should recognize their place: performance testing, and as general review. NOT for finding bugs or ensuring changes didn't break anything and certainly not for finding where they occurred. One should add them as a separate module that is only built when requested, or using something like the FailSafe plugin for Maven.


One idea that he mentions early on in the talk is the idea of having only one assert per test. This is something I'm occasionally guilty of (especially if the method being tested does several things). This should be a testing smell that may indicate the need for some refactoring.

He also mentions what first got him interested in TDD, which I thought was one of the most compelling reasons I've heard so far to use TDD. When you don't use TDD you have a seemingly endlessly depressing cycle of writing tests, fixing bugs, writing more tests, and so do you know when you're finished? When you do TDD, it has a bit more definitive ending point:
  • Think about what you want to do
  • Think about how to test it
  • Write a small test. Think about the desired API
  • Write just enough code to fail the test
  • Run and watch the test fail. (The test-runner, if you're using something like JUnit, shows the "Red Bar"). Now you know that your test is going to be executed
  • Write just enough code to pass the test (and pass all your previous tests)
  • Run and watch all of the tests pass. (The test-runner, if you're using JUnit, etc., shows the "Green Bar"). If it doesn't pass, you did something wrong, fix it now since it's got to be something you just wrote
  • If you have any duplicate logic, or inexpressive code, refactor to remove duplication and increase expressiveness -- this includes reducing coupling and increasing cohesion
  • Run the tests again, you should still have the Green Bar. If you get the Red Bar, then you made a mistake in your refactoring. Fix it now and re-run
  • Repeat the steps above until you can't find any more tests that drive writing new code
(from the C2 wiki)
I like that. This would help address my previously mentioned fear of knowing when you've tested everything. (Though I'm sure it's not foolproof).

Thursday, September 10, 2009

iTunes Pains (Again)

This time the upgrade to iTunes 9 broke my beloved MediaMonkey. I really should just use something else to get my podcasts and blow away iTunes. I don't even have an iPod anymore and have no plans of getting an iPhone -- there's really no reason to keep this piece of trash on my hard drive.
A work around was posted here:
Granted, its not iTunes' fault per se, and I'm sure MediaMonkey will get it fixed soon, just damned irritating. I didn't follow his instructions exactly...what I did was rename d_iPhone.dll to d_iPhone.dll.disabled in the MediaMonkey plugins folder in program files.

MediaMonkey has pushed out a beta release (a bit earlier than they were planning) that fixes compatibility with iTunes 9, only 1 day after it came out:  It might be better to delete the file rather than rename it (if you're comfortable with that), then it will be replaced when the new installer is run.

Beware of openStream

Yesterday we discovered a bug where one of our projects hung and had to be CTRL-C'd. The culprit ended up being one line:
URL url = new URL("http://someurl")
InputStream is = url.openStream()//<---this one
The openStream method is actually shorthand for openConnection().getInputStream(). So it returns a newly instantiated URLConnection and calls getInputStream() on that. The problem is, which the API won't tell you (but is visible in the source) is that the default values for connectTimeout and readTimeout are 0. This means, if the connection/read fails, it will continue to try to connect/read forever. While we tested for the connection to be good before we began processing, getting from a URL caused it to hang when the service went down in the middle of processing. The solution was mentioned in this StackOverflow question. The solution lies in not creating the InputStream from URL, but from URLConnection and setting the timeouts:
int timeoutMS = 5000 // 5 secs
try {
  URL url = new URL("http://someurl")
  URLConnection conn = url.openConnection();
  InputStream is = conn.getInputStream();
} catch ( uhe) {
    // something useful here
} catch ( ste) {
    // something useful here
Eric has posted about this as well (it was his project we learned this from).

Friday, September 4, 2009

Unit Testing: Mein Kampf

So, over the last 1.5 weeks or so I've been having a bit of an identity crisis with unit tests. Sure, I got the basic idea of writing tests for each method in college but like most never used it much while in school. Now that I'm at OCLC and unit testing is expected, I'm trying to develop my own philosophy on the matter and get a feel for it.

I felt like I should bring code coverage up on the project I was working on, but for purely religious reasons. Now, granted, this project is heavy in IO (it's the same project I mentioned previously). So perhaps I'd see less benefit from unit tests on this project than others, but despite the fact I now have >70% coverage (up from 0%), I haven't found a single bug by using unit tests, only with integration tests. The results might also have been different had I used TDD or my home-brewed syncretic TOD approach.

Despite these facts, I don't argue that Unit Testing used with CI can help as a tool for preventing software regressions and as up-to-date documentation for the code (though I still don't find it a very natural read except for BDD approaches like EasyB). It's also useful for finding the root cause of a problem. Whereas an integration test might only be able to say "something blew up" a unit test might be able to tell "here is what blew up". However, the value of the unit tests I've created I think remains to be seen. Meanwhile, it did deliver value as a learning platform.

Groovy has served as an excellent testing platform for me. This particular project was written in Groovy, but I think this would work well with Java projects as well (there is, however, the slight overhead of the additional dependency). I was able to do everything I wanted (still a few kinks to work out) with stubbing using the wonderful, magical, ExpandoMetaClass. There are a few tests I have yet to do where I may have to use their mocking framework.

A couple gotchas:
// getters cannot be overridden using just the property name, even though they can be called that way
class Foo {
  int bar

Foo.metaClass.getBar {->
  return 44
foo1 = new Foo()
assert == 44  // this passes

GroovySystem.metaClassRegistry.removeMetaClass Foo {->
  return 42
Foo foo1 = new Foo()
assert == 42  // this does not

I wanted to send some precanned input to a method that uses BufferedReader to get its input. The constructor from it eventually constructs a File to get the data. I can't extend File or create a new interface with all the File stuff for a test, because that would require modifying BufferedReader and Reader classes to match these changes. I've not found a way around this.

Another problem was a method that takes a String path of a file that contains filepaths of input files to process and adds the string and a BufferedReader to that file to a collection (not sure why that decision was made). So, I tried to mock out eachline(). But there is a problem...
// you cannot metaclass constructors, therefore, this code doesn't work. I've still got to figure out a way of faking a file, since File cannot use map coercion because it has no default constructor
String aPath = "a/path/to/file"
String fakeData = "some\nfake\nstuff\n"

File.metaClass.init {String filePath ->
  def mock = [eachLine: {return "${fakeData}"}, exists: {return true}] as File
  return mock

File f = new File(aPath)  // doesn't work

There are still some larger questions I have that maybe I'll never get THE ANSWER to. One question in particular I've been struggling with is "How simple is too simple to break?" The JUnit authors suggest this is a never-ending source of pain:
while writingTheSameThingOverAndOverAgain
    if getBurnedByStupidDefect
And it's still very easy for me to lose sight of what I'm actually testing in the midst of all the mocking, stubbing, and so forth. More than a few times this last week I've looked down and realized that what I've written is so paranoid that it is really testing stuff that can only fail if the compiler or JVM fails or cosmic rays come down and change my data.

Just as important as making sure your tests pass is making sure they fail. I struggled with this the most when I started this process. I thought "wonderful, everything works." When it turns out that the code didn't work quite the way I thought it did and my tests were actually written in such a way that they would NEVER fail. All those green bars might not actually mean much. That't not to say they're worthless, just maybe not as valuable as you might initially think.

And this leads to my greatest fear: How do you know when something is thoroughly tested? And can some sort of confidence be associated with your tests? Clearly, code coverage doesn't cut it. I'm still new to all this, but I'm not taking much comfort from unit tests. I feel a bit better when integration tests return exactly the result I'm expecting and I test several possible scenarios. Still, even with this, you cannot test all possible scenarios and when do you know you've got enough? I guess when something blows up and you didn't find it. (>_<)

P.S. by 'Mein Kampf' I just meant the literal 'my struggle' it has nothing to do with Hitler or his work.

iTunes Blows

iTunes blows, but we all knew that. The latest chapter in the suckage occurred when I deleted UppperFilters from HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E965-E325-11CE-BFC1-08002BE10318} in an attempt to solve my problem of the disappearing cd drive (it didn't). Instead iTunes lost the ability to write CDs, not something I really care about since I only use iTunes for podcasts now anyway, but the stupid error was kinda annoying. The message said to reinstall iTunes, yet neither a repair install nor uninstalling then reinstalling fixed the issue, even after I manually created the key. Apperantly there are some magic drivers in use by iTunes that their support site will tell you nothing about. I finally got it fixed. My thanks to Ralph and Google.

The Obvious

Read a quote yesterday I rather enjoyed:

by C.A.R. Hoare
There are two ways of creating a software design. One way is to make it so
simple that there are obviously no deficiencies. And the other way is to
make it so complicated that there are no obvious deficiencies.

Friday, August 28, 2009

Why My GroovyConsole Wouldn't Open

For a few weeks now, I've been wondering why I couldn't launch the pretty version of GroovyConsole on my home computer, but could on my work computer. It turns out this is a bug: Groovy needs the 32 bit JDK to launch. Nore more batch hack, I can now launch the GroovyConsole.exe directly. (Though they still haven't built this for the updated versions -- I assume they're holding off for the 1.7 release).

Problem was, at work I need to use HermesJMS which needs JAVA_HOME to be the 64 bit version (doesn't make sense, but it works). What to do? Fortunately, HermesJMS uses a batch file for its invocation, so just add
to line 23 (or anywhere in the beginning, really). Done.

Thursday, August 27, 2009

Answered: Using variables in XmlSlurper’s GPath

Yesterday I asked this question about using variables with XmlSlurper. I got an answer that works (although I don't see it anywhere in the documentation or the book Groovy in Action. Below is the example the answerer gave.
def xmltxt = """
<record name="some record" />
<record name="some other record" />

def xml = new XmlSlurper().parseText(xmltxt)
String foo = "record"
return xml."${foo}".size()

Wednesday, August 26, 2009

Test Oriented Development

The other day I had an idea. Probably a bad idea, but its something I feel more comfortable with than diving straight into TDD. The rules are
  1. You are encouraged to write the test first, but are not forced to.
  2. You may not write more than one method without a test.
This way, you write tests as you are writing code, you might write all your test first, but you will never write too much code that isn't tested (assuming you don't cram it all into one gigantic, godlike method). I dub it TOD (Test Oriented Development).

Friday, August 21, 2009

Integration Testing

My colleagues and I have been learning a lot about testing this week. One of my friends' thoughts are here. He asks the question of whether it is a unit test if the method in question calls other methods. I agree with him and would argue that no, it is not, since the culprit is unclear if the method fails (until you go into debugger). Arguably, this could be worst for code that is properly refactored. This is something I don't think code coverage tools will help you recognize.
StackOverflow has some good definitions of unit and integration testing here, and here. I like Michael Feathers' definition:
A test is not a unit test if:
  • It talks to the database
  • It communicates across the network
  • It touches the file system
  • It can’t run at the same time as any of your other unit tests
  • You have to do special things to your environment (such as editing config files) to run it
Josh Brown, a colleague of mine, also suggests adding:
  • the code under test uses an external framework or library (Hibernate, Spring, some internal library, etc.)
  • the code under test calls other methods (that you didn't mock)
I wholeheartedly agree with both of these.

I had a case that was interesting to me. I was working on some code that I've become the caretaker of to get it better tested (a good thing too, as a bug was discovered). The project takes big files and chunks them into smaller files. It handles any xml or delimited file. The way I was testing it was to take different types of files, chunk them then merge them and make sure the record counts stayed the same. This isn't really (even if I had more granular asserts) a unit test. It relies on an external resource (the sample files) and involves multiple methods. Someone else suggested unit tests using files loaded through classpath, which I disagreed with since they're not really unit tests. Those File objects should be mocked (or possibly stubbed if using Groovy).

I did attempt to make these an integration test with FailSafe. I decided against this when I figured out that I could not override the behavior to not run the integration test on deploys. In my opinion, these should not be run on deploy, integration-test should be later in the Maven lifecycle. The reason for this is that deploys are often done across different environments and the whole idea with integration tests is that they depend on external resources. An example of this is a project my friend is working on, which works with the dev database, where it would be fine to run the integration tests on deploy, but when it is deployed to prod, we definitely don't want to modify our production database as part of running tests. To continue to be able to access the QA environment or dev environment would mean adding special firewall rules. What's more, when I set the configuration to skip the tests, it ran the tests anyway (maybe this is why it's alpha?). According to their documentation, it shouldn't have even run the unit tests. I just don't feel quite comfortable deploying something so young and apparently unstable into production code. Maybe someday this will change. There are some other suggestions here. The page he links to from Codehaus states that there are rumors of a future version of Maven supporting integration tests with a src/it/java and its own integration-test phase. Its kinda surprising that with so many organizations using continuous integration and it not being something all that new.

In the end, I decided to do what others have done, which is to have the integration tests in a separate module that is only built if the argument is passed for it.
The parent pom should have
Also, do not list it in the modules section. This way, the integration module will only be built (and the associated surefire tests run) when the -Dit argument is present. I think for most projects, this makes sense for most projects, though I'm still a little torn on the issue. While Failsafe lets you still build even if the integration test fails, doesn't this defeat the purpose of continuous integration? This is especially the case if your integration tests depend on resources that may be going up and down all the time, it doesn't make much sense to run something every time if half the time you just ignore the results anyway. I also wonder how practical this is for organizations that have resources on multiple subnets, where a deploy from one environment to another can result in failed integration tests not because of any problem with the code, but because of a technical failure.

The next part (and for me, the harder part) will be mocking out (and maybe stubbing with Groovy's metaclass) the pieces needed so I can isolate the methods in the classes for unit tests, as there currently aren't any for this project. I'll post any interesting results I get from that. For other initiates, such as myself, I've found this article helpful:

Tuesday, August 18, 2009

GW2 Update (well, sorta)

Yesterday ArenaNet put the Test Krewe page back up and gave some small hints at forthcoming GW2 news. They've posted a shiny swf on their homepage to be:, created a facebook page:, and a twitter account: All indicators that they're preparing to release some news. Probably nothing as dramatic as a date yet, but we should at least get a trailer and some concept art. Maybe even some discussion about the professions (my prediction is that it will be the core professions from GW1 or very similar, based on the fact the shaping of the world has cut off Tyria from Cantha and Elona).
The related GWGuru page is here: Mostly people are excited (those that haven't left the game anyway) though some other pages lurking on the internets weren't as generous.
Am I peeing my pants excited? No. But as I've said, I do think little tidbits, even if they're contrived and only give the illusion of progress (which I don't think these will be) are an important part of keeping a feeling of aliveness in the community.

Monday, August 17, 2009

The Taste of Cha

I've resumed my habit of regular consumption of green tea. Every time I do, I think of the legend of how green tea came about:

Bodhidharma set about meditating for nine years facing a wall. After five years he was so sleepy that he could not keep his eyelids open and fell asleep. When he woke up he was so angry about this that he cut of his eyelids and threw them on the ground. The eyelids grew into the first green tea plants. And from henceforth, tea has ever been the companion of monks for mediation (and apparently for geeks and programming as well).

It is said that "the taste of tea (cha) and the taste of Zen (Chan) are the same". In Japanese, the characters for 'tea' and 'eyelids' are both 'cha'.

mmmmmm.....cha tastes good.

Massive EMP

I've often wondered: If a massive EMP (from a nuke or whatever) were to go off, wiping out all our technological achievements (silicon chip designs, computer languages, etc), how long would it take for us to rebuild? Is the knowledge mostly inside us? Or are we standing on the soldiers of giants to such an extent, we would have to invent the wheel all over again?

Sunday, August 16, 2009

Substition in Java .properties

Though I ended up not using it for a project I'm working on, I was interested in using the java.util.Properties with substituted strings, as can be done in Ant and Maven. I came across this link to XProperties:

Friday, August 14, 2009

ArenaNet Fail

This is something that has been bugging me, and I'm sure several others over the past couple months. ArenaNet seems to be unable to meet deadlines or even keep up with routine maintenance.

The latest of ArenaNet's offenses has been the Test Krewe page. They enlist players to play around with updates on a test server before it was pushed out to the live one. Cool. The problem? Apparently they don't know how to write PHP! Basic form validation was completely absent, people who used hard returns, too many characters, or certain punctuation broke the page. Come on people, this is basic stuff. They took it down until it could be fixed the same day it was up (Aug 11) and its still down. Yet again, ANet kept the community in the dark. While there was a post by Regina Buenaobra buried in Guild Wars Guru, if you navigate to the page today, you will just get the message "We're sorry, the page you requested could not be found." How hard would it have been to put a message saying "We know this page is broke, we're fixing it". As usual, there is no time estimate whatsoever for when the page will be back up.

Another example of their apparent complete lack of even basic web dev skills was the whole Xunlai Tournament House fiasco. The page was taken down June 23 because of problems with the distribution of the points for May. It's now been 2 months and there is no word on its status. There are players who still haven't gotten their points. Another 'It'll be done when it's done' deals.

GW2 news, plz. I know, I know, everyone who plays Guild Wars would love to have some tasty morsel of information regarding Guild Wars 2 and ArenaNet doesn't want to jump the gun (though I think they already did by suggesting a beta in 2008 then reneging on it). I think especially in the light that there won't be a beta, players would much appreciate a blog or something where once or twice a month SOMETHING is posted about GW2. Maybe its an idea they have for a profession. Maybe its some kind of rough sense of progress (1/2, 2/3, etc). It doesn't have to be much. It doesn't have to be written in stone. Just something to let us know we aren't forgotten, and that GW2 is still making progress.

While this is a problem that is not unique to Guild Wars, I'd really like to see community involvement. Take a look at GuildWarsGuru, there are several threads full of ideas people have for making the sequel kickass. Why not find a way to incorporate these in some way? Maybe even respond to some of them?

It seems to me that ANet is short-staff and/or over-tasked and anachronistic in its communication. I know they're probably working on a tighter budget than some other game companies (though their quarterly report shows increased sales (up 51%), and growth in Guild Wars), but would it really be that expensive to be more open with your customers?

I'm kinda mad.

---Oh, and $20 is too much for character renames.

Thursday, August 13, 2009

My First iGoogle Theme

I experimented with making my own iGoogle theme
The interface they have is rather limiting (couldn't choose the color black for background) than their API, may experiment with that next.

Friday, August 7, 2009

Groovy & CXF JAXB

I was working on making an existing application a mule service, when I ran into the snag that Groovy classes, when exposed as an parameter to the service method, which throws an error when JAXB attempts to deserialize the object to xml (for creating the wsdl). Every Groovy object possesses a property called metaclass that cannot be serialized because it is an interface. At the time, I solved this problem by wrapping the Groovy stuff with Java, to expose only what I wanted. Another way I could have done it would have been to have the Groovy class implement a Java class.

I later found out a third method, when I attended a brown bag session where a colleague gave a talk about using JAXB. Today, I found this link, which describes exactly that. Annotations can be used to override the default JAXB behavior to exclude that messy Groovy stuff. I haven't experimented with this yet (since I've already got my service working and there was no compelling reason to do it in Groovy instead of Java), but I wonder which of these three solutions would be the cleaner solution. I don't have a really strong opinion on this, but its probably preferable to have extra annotations than to have extra classes. Extra wrapping classes just result in more boilerplate code for the next guy down the road to wade through, at least annotations would only add a few lines (one line I believe in the case of CXF). Allegedly, just adding the annotation
to your class will resolve the issue. Has anyone else done this? How did you resolve the issue?

Thursday, August 6, 2009

@version JavaDoc Tags

Another comment that came up in my code review yesterday was why I had an @version 1.0 in my JavaDocs for the class I wrote. My reasoning was that, according to Sun's documentation, the author and version tags are required (though obviously it won't throw a compiler error or anything like that, it may throw an error in Sun's Doc Check tool). This launched a discussion about whether the version tag should indicate revisions in the version control system (in our case svn) or release tags of the software. The practice varies from shop to shop, but my initial reaction (despite Suns' practice) considered tags to be logical, since that would indicate API changes. However, @since I suppose is what really should be used, and a colleague pointed out that changing all the classes to match whatever release is current hides the fact that some classes (particularly abstract data classes) may not have changed at all.

My own conclusion out of this was to use svn keyword substitution to set the tags. I used my favorite svn client, TortoiseSVN, to set the properties. You can either set it as a global setting (which will add the property to new and imported files) or on particular files (need to do for files that already exist). This matches Sun's practice of tying @version to their source control system.

While on the subject of JavaDoc, I was mocked for pointing out a spelling mistake in a JavaDoc comment. In my view, JavaDoc is a special class of comment. Whereas internal comments can contain all the mistakes and profanity you want, JavaDocs are something your client might actually see, and thus should be treated just the same as a mistake like
System.out.println("Pleez enter your name: ");
String name = bufferedReader.readLine();
Am I wrong?
As a quick way of setting Subversion properties for use in Javadoc, I usually create a script with this in it:
for file in $(find ${1}/*/src -maxdepth 10 -name *.java); do if [ -z `svn proplist ${file} | grep -o "svn:keywords"` ]; then svn propset svn:keywords "Author Date Revision Id" ${file}; fi; done

Pieces Parts

The term 'pieces parts' is popular here at OCLC, as a way of describing components, modules, and the like. Is this the case anywhere else? Or is this one of our unique quirks?

Wednesday, August 5, 2009

Things I Learned From My First Code Review

Here's some interesting ideas I saw in my first code review at OCLC

instead of setting to a non-null in constructor to prevent null from being returned, check before it is returned by
doing this
public String getSomeString() {
  return (headerTag == null) ? new String() : headerTag;
instead of this
String foo = new String();
public String getSomeString() {
  return foo;

for String concantenation, StringBuffer is faster than String ( Unlike Strings, which are immutable, StringBuffers are mutable.

surround your log4j debug log statements in an
if (log.isDebugEnabled()) {
  log.debug("message here")
so that the string isn't needlessly constructed (

avoid NPEs in comparisons, do:
String foo = "Some_String";
if (foo.equals(someString)) { ... }
instead of:
if (someString.equals("Some_String")) { ... }

Thursday, July 30, 2009

Error in mule-cxf.xsd?

I'm reposting this, as I've reached new conclusions. Originally, I was convinced that there was an error in the xsd for the cxf transport in mule. A collegue of mine posted a question on this to mulesource, but we haven't gotten a response back yet (except from me). Examples on their wiki feature using the inbound-endpoint in the cxf namespace with the address attribute, such as:
<cxf:inbound-endpoint address="http://localhost:63081/hello" wsdlLocation="foo/bar/hello.wsdl" />
And this style is correctly interpreted by Mule and runs fine. However, the address attribute shows up in red in IntelliJ IDEA as an invalid attribute on the cxf namespace. Anyone else ran into this? I originally attributed this to its not being present in the inbound-endpoint element in the cxf namespace, but being present in core. But the cxf inbound-endpoint is actually an extension of the one in core, this is visible in XmlSpy and it validates fine in Eclipse. I now believe this is some kind of problem with either the xml editor and/or error highlighting components of IntelliJ. I can't find any Jira cases about this or any reference to it in the forums, so I opened a Jira case here. Maybe we should all just use Eclipse?

During this research, I did come across another problem (which I originally thought was the cause of our issue). Some IntelliJ fans ran into it a while ago. There already is a Jira case with Mulesource for this here. The cxf shema wouldn't validate because there was a bad address for an imported schema for Spring. It's a one line fix. Instead of
<xsd:import namespace="" schemaLocation=""/>
It should have read
<xsd:import namespace="" schemaLocation=""/>
The fix was merged into the 2.2.2 branch last week (and into 2.1.4 as well I believe). I don't know when they're deploying these fixes though, in the mean time you can download the files here (for 2.1.x) or here (for 2.2.x). I've tried to validate with these new files and validation still fails, though its due to the spring-context.xsd not Mule. Why can't people get xsds right? It would seem kind of important, as it's what everyone will be building their xmls off of.

Coincidentally, I've learned in writing this post how to paste xml/html code in blogger (found it here). Less than signs (<) must be written in their html equivalent, which is &lt;. There's also a cool script to translate it for you here. And to quote html special characters like I did above, use &amp; instead of &. Also, Notepad++ has a built-in TextFX that can escape to HTML.

-- UPDATE 2009-11-17 --
JetBrains has released an update to IDEA 8, which includes the fix to the bug I submitted. The update notes can be found here
It is available as a free upgrade for those already with IDEA 8, get yours.
I believe this fix is also in the upcoming IDEA 9, so it should be in the Community Edition, for those who don't want to shell out for full-featured IDE.