Thursday, October 3, 2013

WebHDFS vs Native Performance

So after I heard about WebHDFS, I became curious about it's performance characteristics compared to the native client, particularly after reading this blog entry.  Oddly, I found my results to be dramatically different from Andre's findings.

Here's the results of my experiments
Size Native Avg % Faster
10 MB -20.0%
100 MB 34.3%
500 MB 48.3%
1 GB 79.4%
10 GB 90.1%

As you can see, the native client generally handily beats WebHDFS, and there seems to be a correlation between the performance gap and the file size.  I haven't had the time yet to look into the technical details of why this might be.  There are some differences between our tests to note:
  • The latency between my client and the server is much lower (about 0.29ms instead of 23ms)
  • My client is in the same data center rather than a remote data center, with 10GbE connecting it to the server
  • I used wget instead of a Python WebHDFS client

It's possible there's network or cluster configuration differences that could contribute as well (including differences in Hadoop versions).  My takeaway from this was that it's better to observe your actual performance before deciding which approach to take.

HDFS NameNode Username Hack

I created a userscript to override the username when (when programmatically detectible) to allow you to read HDFS directories and files that aren't world readable.  Nothing fancy here, you could edit the URL yourself, this just makes it easier.  The script is hosted here:, and the source is available here:

Sunday, September 8, 2013

NSIS SelectSection

So, I haven't seen a working example of how Sections.nsh's SelectSection could be used (at least not one that worked for me anyway).  So here's an example based on what I did in the installer for Meld.  The goal was to include some default sections if the install was silent.
!include "Sections.nsh"
Section "Foo" foo
    ; install stuff here
Section "" unfoo
     ; uninstall stuff here
Function .onInit
     IfSilent isSilent notSilent
     !insertMacro SelectSection foo
     !insertMacro SelectSection unfoo

Monday, July 22, 2013

Hadoop SleepInputFormat

I whipped up a little class to provide dummy input to Hadoop jobs for testing purposes. Hadoop had something like this, but they haven't updated it for Hadoop 2 for some reason. My class works with the new API.

Edit: They've now updated it.

Thursday, April 25, 2013

Hadoop Writing Bytes

There are times where you might want to write bytes directly to HDFS.  Maybe you're writing binary data.  Maybe you're writing data with varying encodings.  In our case, we were doing both (depending on profile) and were trying to use MultipleOutputs to do so.  We discovered that there was no built-in OutputFormat that supported bytes, nor was there any examples on the web of how to do this with the new API. Granted, it's not overly complicated, but to save you a little time, here's what I came up with.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.ReflectionUtils;


public class BytesValueOutputFormat extends FileOutputFormat<NullWritable, BytesWritable> {

    public RecordWriter<NullWritable, BytesWritable> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        Configuration conf = taskAttemptContext.getConfiguration();
        boolean isCompressed = getCompressOutput(taskAttemptContext);
        CompressionCodec codec = null;
        String extension = "";
        if (isCompressed) {
            Class<? extends CompressionCodec> codecClass = getOutputCompressorClass(taskAttemptContext, GzipCodec.class);
            codec = ReflectionUtils.newInstance(codecClass, conf);
            extension = codec.getDefaultExtension();
        Path file = getDefaultWorkFile(taskAttemptContext, extension);
        FileSystem fs = file.getFileSystem(conf);
        if (!isCompressed) {
            FSDataOutputStream fileOut = fs.create(file, false);
            return new ByteRecordWriter(fileOut);
        } else {
            FSDataOutputStream fileOut = fs.create(file, false);
            return new ByteRecordWriter(new DataOutputStream(codec.createOutputStream(fileOut)));

    protected static class ByteRecordWriter extends RecordWriter<NullWritable, BytesWritable> {
        private DataOutputStream out;

        public ByteRecordWriter(DataOutputStream out) {
            this.out = out;

        public void write(NullWritable key, BytesWritable value) throws IOException {
            boolean nullValue = value == null;
            if (!nullValue) {
                out.write(value.getBytes(), 0, value.getLength());

        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {


Here's an example usage with MultipleOutputs
MultipleOutputs<NullWritable, BytesWritable> multipleOutputs = new MultipleOutputs<NullWritable, BytesWritable>(context);
byte[] bytesToWrite = someAppLogic();
MultipleOutputs.write(NullWritable.get(), new BytesWritable(bytesToWrite), fileName);

And of course, since it's like any other OutputFormat, it can also work with LazyOutputFormat if desired (as well as just about anything else you might choose to do with an OutputFormat).
LazyOutputFormat.setOutputFormatClass(job, BytesValueOutputFormat.class);

In our case, this the last step in our sequence of Hadoop jobs so we had no further need for the key. One could conceive of situations in which further manipulation is needed. In such cases, you could attempt some sort of delimited binary (to separate the key from the value), but it might be easier to just keep it all as text and use Common Codec's Base64 to pass the bytes value between jobs.

Tuesday, March 12, 2013

Hadoop overwrite options

There's an undocumented feature (Hadoop's documentation needs some serious love) that allows you to overwrite the destination just like you can with Unix's cp -f in Hadoop's dfs in the cp and copyFromLocal commands.  I've added HADOOP-9381 with a patch to document this feature in both the help page and the web page.

While I was looking at this, I realized that the mv and moveFromLocal commands didn't recognize the -f option, even though Unix's mv command does. Since it was simple to add, I created HADOOP-9382 with a patch to address that issue.

Friday, January 4, 2013

The Bluffer's Prayer

I had a pretty lucky night at a poker game when this idea came to me.  Here's a prayer sure to turn your luck around (inspired by The Lord's Prayer).

The Bluffer's Prayer
Our River which art in heaven, Hallowed be thy name.
My upswing come. Thy will be done in the cards, as it is without.
Give us this hand all the chips.
And forgive us our raises, as we forgive those who have raised against us.
And lead us not into check-raises, but deliver us from bullies: For thine is the kingdom, and the power, and the glory, for ever. Amen.

Psalm 23 for Materialists

I think I was driving home from somewhere when this idea popped in my head.  Apologies to those favoring more modern language, King James Version continues to hold a special place in my heart.

Psalm 23 for Materialists

Materialism is my shepherd; I shall always want.
It maketh me to lie down in pricey furniture stores: it leadeth me beside the Jacuzzi waters.
It consumes my soul: it leadeth me in the paths of greed for its name's sake.
Yea, though I walk through the valley of the shadow of recession, I will see no evil: for thou art with me; thy early termination fees and thy clearances they comfort me.
Thou preparest a shopping cart before me in the presence of rival Black Friday shoppers: thou anointest my head with expensive hair products; my Big Gulp runneth over.
Surely greed and meaninglessness shall follow me all the days of my life: and I will dwell in the house of materialism for ever.

Medal of Honor: Warfighter Review

So, I had to write this review in the light of how terribly wrong pretty much every critic out there is.  I generally respect IGN's reviews of games.  They're usually a bit hard on games, but they're usually equally hard on games, so they do tend to be fair.  My review of the first game in the reboot pretty much agrees with IGN's review.  But IGN's review of Warfighter, while bringing up some legitimate problems, doesn't accurately reflect what kind of game it truly is.

First, to acknowledge the issues people have raised against this game: Yes, there does seem to be some framerate issues here and there.  Though it's more something you see after playing the game a while, not something so bad that it disrupts gameplay.  Graphically for the most part, it's on par with Battlefield 3.  Yes, the campaign does over-use breaching.  It doesn't feel exactly annoying, just pointless.  Yes, the campaign doesn't introduce anything that hasn't been done in other games (though that's usually the case, and when they do e.g. Bulletstorm, they're not usually received well).  Yes, enemies aren't the sharpest knives in the box, and could make better use of cover and work together as a team better.  I'd add that it's frustrating to see them run past the rest of your team, to target you specifically.  This makes no tactical sense and is the same lame AI they used in the 2010 title.  They aren't the only series to do this, but boy is it annoying.  Some have complained about the story as well.  I thought it wasn't exactly an exciting plot, but it does tell the story fairly well.  Complainers are advised to note that it's been some time before we've had good storytelling in a shooter, and no one buys shooters for their storyline anyway.  And finally, yes, the menu system is confusing at first.  But once you get used to it, it's actually rather nice.

As far as weapon design (something I've not really heard anyone comment on), the weapons seem to have less recoil than they ought, and all sound suppressed, but the damage is more realistic (or at leat the bullet tracking seems to work better) than games like Call of Duty, particularly with headshots (though in my opinion it could be cranked up even further).  Actually, the weapons are more responsive and damaging than Battlefield 3 as well, and I have far fewer latency issues than I do in Call of Duty.  There are also some cool animations in multiplayer, like rappelling out of a Blackhawk spawn point, cutting wires as you deactivate explosives placed by opposing players, throwing a UAV to deploy it, sliding or diving into cover, and great takedown sequences when you melee with the tomahawk.  The multi-nation features they've added, like the ability to play as different nations an apply tokens towards different nations' victory are a nice touch.  I do agree with those that say the nations feature needs adjusted.  Since it is scored by # of tokens / # of contributing players, countries that have only a few big contributors beat out nations with more massive amounts of tokens because there are also more players contributing.  This has lead to Portugal being the victor each of the first 3 seasons.  The absolute biggest thing Warfighter has going for it is the buddy system.  The buddy system lets you can get health and ammo from your buddy, points for your buddy spawning on you, for saving or avenging your buddy, for being close to your buddy while he makes kills.  Speaking of the fact it's a he, why are there no female fighters in the game?  It's unfathomable to me how critics can knock this game for not having made any innovations, when they've added this really fun and effective mechanic. Then go on to say "This is not the shooter you expected" about Black Ops 2, of which the only new thing is the strike force missions with their overhead tactical operations (something that is neat to see, but not something I care to play), everything else about it is really a tweak of what they've been doing the last several years, and the engine hasn't been touched at all.  I loved Modern Warfare 2, and Call of Duty is still a solid series (and offers more types of multiplayer), but I feel like they have to do more to deserve remaining the top shooter out there, and in my opinion (having bought both games), I feel Warfighter is actually the better game and the one I spend most of my time in these days.  It's not as realistic as the Battlefield series (Bad Company 2 having more realistic destruction in my opinion), but I'm finding it more fun for some reason.

Thursday, January 3, 2013

Meld Windows Installer

I'm a huge fan of Meld as a diffing / merging tool.  It's the nicest looking, most powerful tool of it's kind that I've come across.  (Although I'd also like to put a shout out to KDiff3 for being able to handle larger files than anything else I've seen besides GNU Diff, and to WinMerge as another popular choice).  The trouble is, if you run Windows, it's a bit of a pain to set up.  You have to install Python, GTK+ and PyGTK (made easier by the nice all-in-one installer they now have), then Meld itself, and finally write a script to launch the appropriate Python command and create shortcuts to that script for convenience.  This is a shame because I'd really like to get more Windows users using this wonderful tool.
So I made an installer than includes all of these and has no extra dependencies needed.  Just install and go.  The only thing it doesn't currently have is support for syntax highlighting (needs PyGtkSourceView, which is not included in Portable Python, which I used), and VCS browsing (needs GNU Patch via Cygwin).  Here's where you can get it:  Lemme know if you find any mistakes, it's my first time using NSIS.

Edit: I've now also created a portable .zip archive.

Edit: Vote here to ask for PyGtkSourceView to be added to Portable Python, so I can use it (edit: it appears PyGTK in Portable Python already includes PyGtkSourceView).  I'll look into doing this without Portable Python (though that'll be easier for me) and possible GNU Patch as well, but am not sure how hard it will be.

Edit: I've made a significant update to this.  My thanks to Angel Ezquerra of TortoiseHG for his suggestions and testing assistance.