Pages

Showing posts with label software development. Show all posts
Showing posts with label software development. Show all posts

Friday, June 26, 2020

Build latest libvips

As mentioned in the libvips README, you can easily compile libvips if you need a newer version than is provided by your distribution. For example,
FROM ubuntu:18.04

ENV VIPS_VERSION=8.9.2
RUN set -o errexit -o nounset && \
    wget --no-verbose https://github.com/libvips/libvips/releases/download/v${VIPS_VERSION}/vips-${VIPS_VERSION}.tar.gz && \
    tar -xf /vips-${VIPS_VERSION}.tar.gz && \
    rm vips-${VIPS_VERSION}.tar.gz && \
    cd vips-${VIPS_VERSION} && \
    ./configure && \
    make && \
    make install && \
    ldconfig && \
    cd .. && \
    rm --recursive vips-${VIPS_VERSION}

But if you want to save some time recompiling (or want a deb for any other reason), you can modify the packaging to use the version you compiled instead. Here's an example with Ubuntu.
# docker build -t vips-deb .
# docker run --rm -v $PWD/debs:/debs vips-deb

FROM ubuntu:18.04

RUN mkdir /debs

WORKDIR /build

RUN set -o errexit -o nounset \
    && apt-get update --yes \
    && apt-get install --no-install-recommends --no-install-suggests --yes \
        build-essential \
        ca-certificates \
        devscripts \
        equivs \
        wget \
    && rm --recursive --force /var/lib/apt/lists/*

# use the Debian version that matches the base image from https://launchpad.net/ubuntu/+source/vips
ENV DEBIAN_VIPS_VERSION=8.4.5-1build1
ENV VIPS_VERSION=8.9.2
RUN set -o errexit -o nounset \
    && wget --no-verbose https://github.com/libvips/libvips/releases/download/v${VIPS_VERSION}/vips-${VIPS_VERSION}.tar.gz \
    && tar -xf vips-${VIPS_VERSION}.tar.gz \
    && rm vips-${VIPS_VERSION}.tar.gz \
    && cd vips-${VIPS_VERSION} \
    && wget --no-verbose https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/vips/${DEBIAN_VIPS_VERSION}/vips_${DEBIAN_VIPS_VERSION}.debian.tar.xz \
    && tar -xf vips_${DEBIAN_VIPS_VERSION}.debian.tar.xz \
    && rm vips_${DEBIAN_VIPS_VERSION}.debian.tar.xz \
    && cd debian \
    && EMAIL=john@example.com NAME="John Doe" dch -v ${VIPS_VERSION}-1 "Update to version ${VIPS_VERSION}" \
    && sed -Ei 's/^# deb-src /deb-src /' /etc/apt/sources.list \
    && apt-get update --yes \
    # don't install Python bindings (this will be supplied by pyvips)
    && sed -Ei '/Package: python-vipscc/,/^$/d' control \
    && mk-build-deps -t 'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends' -i control \
    && debuild -i -us -uc -bCMD cp /build/*.deb /debs \
    && rm --recursive --force /var/lib/apt/lists/*

CMD cp --force /build/*.deb /debs
One thing to note, as the Pyvips README says
If you have the development headers for libvips installed and have a working C compiler, this module will use cffi API mode to try to build a libvips binary extension for your Python. If it is unable to build a binary extension, it will use cffi ABI mode instead and only needs the libvips shared library. This takes longer to start up and is typically ~20% slower in execution.
You can confirm whether it is running in ABI mode after installing libvips by running this command
echo -e 'import logging\nlogging.basicConfig(level=logging.DEBUG)\nimport pyvips' | python3
Example ABI Mode Output
DEBUG:pyvips:Binary module load failed: No module named '_libvips'
DEBUG:pyvips:Falling back to ABI mode
DEBUG:pyvips:Loaded lib <cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7f6c56754978>
DEBUG:pyvips:Loaded lib <cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7f6c56754860>
DEBUG:pyvips:Inited libvips
Example Non-ABI Mode Output
DEBUG:pyvips:Loaded binary module _libvips
DEBUG:pyvips:Module generated for libvips 8.9
DEBUG:pyvips:Linked to libvips 8.9
DEBUG:pyvips:Inited libvips

Sunday, May 3, 2020

Blogger Syntax Highlighting (Again)

I was working on a blog post with Kotlin and noticed SyntaxHighlighter had no brush for Kotlin, not even V4.  Given that, and the popularity of highlight.js, I've decided to switch to that. You just gotta add something like this in the <header> of your Blogger template.
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/default.min.css' rel='stylesheet'/>-->
<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/androidstudio.min.css' rel='stylesheet'/>
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/atom-one-dark.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/agate.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/darcula.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/rainbow.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/railscast.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/solarized-dark.min.css' rel='stylesheet'/>-->
<!--<link href='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/styles/zenburn.min.css' rel='stylesheet'/>-->
<script src='//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/highlight.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/autohotkey.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/bash.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/c-like.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/c.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/clojure.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/cpp.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/csharp.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/css.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/dockerfile.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/dos.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/fsharp.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/java.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/javascript.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/go.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/groovy.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/markdown.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/nsis.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/pgsql.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/plaintext.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/powershell.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/properties.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/python.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/ruby.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/scala.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/scss.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/sql.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/typescript.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/vbscript.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/xml.min.js'/>
<script charset='UTF-8' src='https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.0.0/languages/yaml.min.js'/>
<scrpt>hljs.initHighlightingOnLoad();</script>

There are many other languages and styles available. And as with SyntaxHighlighter, you need to escape < with &lt; and > with &gt.
Then you put your code inside
<pre><code class="kotlin"></code>

</pre>
or
<pre><code class="lang-kotlin">

</code></pre>
or
<pre><code class="language-kotlin">

</code></pre>

Wednesday, August 26, 2015

A Maven Plugin Developer's Thoughts on Gradle

Introduction
I use both Gradle and Maven, and I maintain a plugin for working with Groovy (GMavenPlus). I thought this gave me a somewhat unique perspective. I know this comparison is far from exhaustive, but I wanted to share some thoughts I'd jotted down.

When I first heard about Gradle, I was fairly dubious. Maven had been around for about 6 years at that point and they were making some rather large promises. Although I was intrigued, my reaction was "come back to me when you have a non-RC 1.0 release". Since then, they've had several stable releases and about 6 years of their own maturing.

What I like about Gradle
  • They kept the best parts of Maven
    • The lifecycle concept
    • The standardized project layout
    • The dependency management concept
  • Very concise syntax.  It's an amazing feeling to be able to do a standard Java build with a single line of code.
  • Keeps the simple things simple, while allowing you to use the power of Groovy and Ant if you need something non-standard.
  • Gradle Wrapper (gradlew) is a nice way to help keep builds reproducible
  • Application Plugin (Maven has ways to do this, but having it in a plugin is convenient)
  • Much more flexible than Maven (in nearly every way -- dependency management, lifecycle customization, custom tasks, etc)
  • Can put environment config and dev helper scripts right into the build as tasks instead of separate script files
  • HTML test report is nicer than Surefire's, and I now prefer it over console text
  • Has a test runtime scope (unlike Maven)
  • implementation vs api scopes are brilliant
  • Can lock transitive dependency versions for superior build reproducibility
  • Can share dependency versions between projects (called platforms)
What I don't like about Gradle
  • Allowing people to do custom things so easily might lead to the tangled mess that was Ant. There's already some pretty crazy builds out there (though arguably this isn't much riskier than what you could do with AntRun or GMavenPlus).
  • Because it's Groovy instead of XML, you don't answers get as helpful hints from your IDE.
  • Can't include multiple source directories each with their own includes/excludes patterns (they have a TODO for this).
  • Groovy building Groovy can sometimes be problematic
  • No equivalent of mvn install, which makes end-to-end testing of plugins clunkier (though they now offer TestKit that should help with this)
  • Some areas might be perceived as being less mature
  • No isolation of plugin dependencies -- they all run on the same classpath (unlike Maven), unless the author took the time to do custom classloader work, which most don't.
  • Plugins use whatever repository is declared instead of project repositories. So if you don't want to use JCenter, for example, but a plugin you use does, then you end up using that repository too (see #10376).
Notable differences from Maven
  • In Gradle, default target bytecode version is version of JDK, whereas in Maven default is currently 1.5
  • Gradle's commands are case-insensitive (something I actually really like)
  • Gradle doesn't compile tests unless specifically asked for (gradle classes != mvn compile, gradle testClasses == mvn compile) (not really a positive or negative, just something to be aware of)
  • Gradle will stop testing after the first module to have a test failure. You have to use the --continue argument to continue on to the other modules the way Maven does. I kinda wish this was the default.
What I like About Maven
  • Maven's inflexibility has largely proven very successful in preventing pointless deviation from the established pattern.
  • The lifecycle concept
  • The standardized project layout
  • The dependency management concept
What I don't like About Maven
  • XML is pretty cumbersome (though Polyglot for Maven isn't completely dead). This leads to out of control copy-and-pasting of POMs, with dependencies that have no business being there.
  • Convention over configuration doesn't go far enough (ideally, a basic Java project should be just a few lines, like it is in Gradle)
  • No ability to specify ranges of versions that are acceptable for transitive resolution (I haven't used it much -- but it's a really cool feature)
  • No TestRuntime scope
  • No Maven Wrapper equivalent of gradlew (though there is an unofficial project)
How Gradle has improved Maven
It took a while, but you can finally opt out of transitive dependencies. I think Gradle (and Ivy) earned a good deal of credit for the pressure to offer this feature.

Conclusion?
I honestly think despite there being some areas Gradle can be improved, it's generally the better choice for most projects.  There are circumstances where this isn't the case (like if you're building a Maven plugin). But in most cases, I think Gradle's pros outweigh the cons. But if you prefer Maven (or your CTO/CIO/Architect is making the choice for you), rest assured that despite my recommendation, I'm committed to maintaining GMavenPlus for the indefinite future so that the choice is always yours to make.

More Resources

Update (2021): I've added some new bullet points. Some of the newer features (Gradle 5 and 6 especially) add some pretty compelling features that might tip the scales for you, depending on your use case.

Wednesday, July 15, 2015

Custom configuration script ASTs

Nikolay Totomanov asked on the Groovy mailing list how one could add a default constructor (if it doesn't already exist) to all classes. I realized there were no examples on the internet (that I could find anyway) of
  1. How to pass parameters into an AST in a configuration script
  2. How to use a custom AST in a configuration script
So I decided to try to remedy that with this post.

1. AST parameters in a configuration script
Let's say you wanted to do
@groovy.transform.TupleConstructor(includes=['foo'])
in a configuration script. How do you pass the includes? Use a map. The equivalent configuration script would be
withConfig(configuration) {
    ast(groovy.transform.TupleConstructor, includes:['foo'])
}

2. Custom AST in a configuration script
If you wanted to create your own AST and use it in a configuration script, I suggest looking at Groovy as a starting point. Here, I'll use groovy.transform.TupleConstructor and org.codehaus.groovy.transform.TupleConstructorASTTransformation as an example to solve Nikolay's problem. Here is the result
import java.lang.annotation.ElementType
import java.lang.annotation.Retention
import java.lang.annotation.RetentionPolicy
import java.lang.annotation.Target
import org.codehaus.groovy.ast.ASTNode
import org.codehaus.groovy.ast.AnnotatedNode
import org.codehaus.groovy.ast.ClassNode
import org.codehaus.groovy.ast.ConstructorNode
import org.codehaus.groovy.ast.Parameter
import org.codehaus.groovy.ast.stmt.BlockStatement
import org.codehaus.groovy.control.CompilePhase
import org.codehaus.groovy.control.SourceUnit
import org.codehaus.groovy.transform.AbstractASTTransformation
import org.codehaus.groovy.transform.GroovyASTTransformation
import org.codehaus.groovy.transform.GroovyASTTransformationClass

@GroovyASTTransformation(phase = CompilePhase.CANONICALIZATION)
public class DefaultConstructorASTTransformation extends AbstractASTTransformation {
    public void visit(ASTNode[] nodes, SourceUnit source) {
        init(nodes, source)
        AnnotatedNode parent = (AnnotatedNode) nodes[1]
        if (parent instanceof ClassNode) {
            ClassNode cNode = (ClassNode) parent
            // doesn't already have a default constructor
            if (!cNode.getDeclaredConstructor(new Parameter[0])) {
                cNode.addConstructor(new ConstructorNode(
                    ACC_PUBLIC, new Parameter[0], cNode.EMPTY_ARRAY, new BlockStatement()))
            }
        }
    }
}
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
@GroovyASTTransformationClass("DefaultConstructorASTTransformation")
public @interface DefaultConstructor {}

withConfig(configuration) {
    ast(DefaultConstructor)
}

Tuesday, May 26, 2015

Guava Hadoop Classpath Issue

Blogging this because it was slightly too large for a tweet.  If you've got a stacktrace like


java.lang.NoClassDefFoundError: com/google/common/io/LimitInputStream
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:467) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)


You may find this problematic dependency tree
\---org.apache.hadoop:hadoop-client
    +---org.apache.hadoop:hadoop-common
        +---org.apache.hadoop:hadoop-auth

It seems Google has once again broken compatibility in Guava by removing LimitInputStream in Guava 15.  And while much of Hadoop (except the new versions which have upgraded their Guava version) are on an older version of Guva, the hadoop-auth module contains a newer version of Guava that most dependency management tools (aka Maven and Gradle) will choose over the older version.  Adding an exclusion for this transitive dependency should resolve this issue.

Friday, May 15, 2015

IntelliJ and junit-hierarchicalcontextrunner

For those using junit-hierarchicalcontextrunner getting an exception like
java.lang.Exception: The inner class com.mycompany.SomeClassTest$InnerClass is not static.
 at org.junit.runners.BlockJUnit4ClassRunner.validateNoNonStaticInnerClass(BlockJUnit4ClassRunner.java:113)
...
You might be able to get rid of that error (especially if it happens when running in IntelliJ, but not the commandline) by upgrading to JUnit 4.12 and junit-hierarchicalcontextrunner 4.12.0. However, if you're an IntelliJ user, you'll find that running an individual test still runs all the tests. This will be fixed in an upcoming release.

Friday, May 1, 2015

Codehaus Migration

Since Codehaus is shutting down, you may be wondering where a project you use has moved. Here's where some of the more popular projects have moved.

Project Old Homepage New Homepage
EasyMock http://easymock.codehaus.org/ http://easymock.org/
Enunciate http://enunciate.codehaus.org/ http://enunciate.webcohesion.com/
Esper http://esper.codehaus.org/ http://www.espertech.com/esper/index.php
Fabric3 http://fabric3.codehaus.org/ http://www.fabric3.org/
Gant http://gant.codehaus.org/ http://gant.github.io/
Geb http://geb.codehaus.org/ http://www.gebish.org/
GMavenPlus http://gmavenplus.codehaus.org/ https://github.com/groovy/GMavenPlus/
GPars http://gpars.codehaus.org/ http://gpars.github.io/
Griffon http://griffon.codehaus.org/ http://new.griffon-framework.org/
Groovy http://groovy.codehaus.org/ http://www.groovy-lang.org/
GroovyFX http://docs.codehaus.org/display/GROOVY/GroovyFX http://groovyfx.org/
Gumtree http://gumtree.codehaus.org/ https://github.com/Gumtree/gumtree
IzPack http://docs.codehaus.org/display/IZPACK/Home http://izpack.org/
Jackson http://jackson.codehaus.org/ https://github.com/FasterXML/jackson
JavaNCSS http://javancss.codehaus.org/ NONE YET (though Codehaus has a mirror on Github)
jMock http://jmock.codehaus.org/ http://www.jmock.org/
JRuby http://jruby.codehaus.org/ http://jruby.org/
M2Eclipse http://m2eclipse.codehaus.org/ http://eclipse.org/m2e/
Mojo http://mojo.codehaus.org/ https://github.com/mojohaus/ (still transitioning)
MVEL http://mvel.codehaus.org/ https://github.com/mvel/mvel
Pico Container http://picocontainer.codehaus.org/ https://github.com/picocontainer
Plexus Classworlds http://plexus.codehaus.org/plexus-classworlds/ https://github.com/sonatype/plexus-classworlds
Plexus Containers http://plexus.codehaus.org/plexus-containers/ https://github.com/sonatype/plexus-containers
Sonar http://sonar.codehaus.org/ http://www.sonarqube.org/
Sonar http://sitemesh.codehaus.org/ http://wiki.sitemesh.org/wiki/display/sitemesh/Home
StaxMate http://woodstox.codehaus.org/ https://github.com/FasterXML/StaxMate
SVN4J http://svn4j.codehaus.org/ http://sourceforge.net/projects/svn4j/
Woodstox http://woodstox.codehaus.org/ https://github.com/FasterXML/woodstox
XStream http://xstream.codehaus.org/ http://x-stream.github.io/


I'll keep this page updated as more information becomes available (let me know if you spot something incorrect or out of date). Also if there's a project not on this list that you think should be, let me know and I'll add it. I know some of these moves are old news, but I listed them anyway since once Codehaus shuts down, any redirects they may have had will stop working.

Thursday, October 3, 2013

WebHDFS vs Native Performance

So after I heard about WebHDFS, I became curious about it's performance characteristics compared to the native client, particularly after reading this blog entry.  Oddly, I found my results to be dramatically different from Andre's findings.

Here's the results of my experiments
Size Native Avg % Faster
10 MB -20.0%
100 MB 34.3%
500 MB 48.3%
1 GB 79.4%
10 GB 90.1%

As you can see, the native client generally handily beats WebHDFS, and there seems to be a correlation between the performance gap and the file size.  I haven't had the time yet to look into the technical details of why this might be.  There are some differences between our tests to note:
  • The latency between my client and the server is much lower (about 0.29ms instead of 23ms)
  • My client is in the same data center rather than a remote data center, with 10GbE connecting it to the server
  • I used wget instead of a Python WebHDFS client

It's possible there's network or cluster configuration differences that could contribute as well (including differences in Hadoop versions).  My takeaway from this was that it's better to observe your actual performance before deciding which approach to take.

HDFS NameNode Username Hack

I created a userscript to override the username when (when programmatically detectible) to allow you to read HDFS directories and files that aren't world readable.  Nothing fancy here, you could edit the URL yourself, this just makes it easier.  The script is hosted here: http://userscripts.org/scripts/show/179132, and the source is available here: https://gist.github.com/keeganwitt/6810986.

Sunday, September 8, 2013

NSIS SelectSection

So, I haven't seen a working example of how Sections.nsh's SelectSection could be used (at least not one that worked for me anyway).  So here's an example based on what I did in the installer for Meld.  The goal was to include some default sections if the install was silent.
!include "Sections.nsh"
Section "Foo" foo
    ; install stuff here
SectionEnd
Section "un.foo" unfoo
     ; uninstall stuff here
SectionEnd
Function .onInit
     IfSilent isSilent notSilent
     isSilent:
     !insertMacro SelectSection foo
     !insertMacro SelectSection unfoo
     notSilent:
FunctionEnd

Monday, July 22, 2013

Hadoop SleepInputFormat

I whipped up a little class to provide dummy input to Hadoop jobs for testing purposes. Hadoop had something like this, but they haven't updated it for Hadoop 2 for some reason. My class works with the new API.
https://gist.github.com/keeganwitt/6053872

Edit: They've now updated it.

Thursday, April 25, 2013

Hadoop Writing Bytes

There are times where you might want to write bytes directly to HDFS.  Maybe you're writing binary data.  Maybe you're writing data with varying encodings.  In our case, we were doing both (depending on profile) and were trying to use MultipleOutputs to do so.  We discovered that there was no built-in OutputFormat that supported bytes, nor was there any examples on the web of how to do this with the new API. Granted, it's not overly complicated, but to save you a little time, here's what I came up with.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.GzipCodec;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.ReflectionUtils;

import java.io.DataOutputStream;
import java.io.IOException;


public class BytesValueOutputFormat extends FileOutputFormat<NullWritable, BytesWritable> {

    @Override
    public RecordWriter<NullWritable, BytesWritable> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        Configuration conf = taskAttemptContext.getConfiguration();
        boolean isCompressed = getCompressOutput(taskAttemptContext);
        CompressionCodec codec = null;
        String extension = "";
        if (isCompressed) {
            Class<? extends CompressionCodec> codecClass = getOutputCompressorClass(taskAttemptContext, GzipCodec.class);
            codec = ReflectionUtils.newInstance(codecClass, conf);
            extension = codec.getDefaultExtension();
        }
        Path file = getDefaultWorkFile(taskAttemptContext, extension);
        FileSystem fs = file.getFileSystem(conf);
        if (!isCompressed) {
            FSDataOutputStream fileOut = fs.create(file, false);
            return new ByteRecordWriter(fileOut);
        } else {
            FSDataOutputStream fileOut = fs.create(file, false);
            return new ByteRecordWriter(new DataOutputStream(codec.createOutputStream(fileOut)));
        }
    }

    protected static class ByteRecordWriter extends RecordWriter<NullWritable, BytesWritable> {
        private DataOutputStream out;

        public ByteRecordWriter(DataOutputStream out) {
            this.out = out;
        }

        @Override
        public void write(NullWritable key, BytesWritable value) throws IOException {
            boolean nullValue = value == null;
            if (!nullValue) {
                out.write(value.getBytes(), 0, value.getLength());
            }
        }

        @Override
        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
            out.close();
        }
    }

}

Here's an example usage with MultipleOutputs
Instantiation
MultipleOutputs<NullWritable, BytesWritable> multipleOutputs = new MultipleOutputs<NullWritable, BytesWritable>(context);
Writing
byte[] bytesToWrite = someAppLogic();
MultipleOutputs.write(NullWritable.get(), new BytesWritable(bytesToWrite), fileName);

And of course, since it's like any other OutputFormat, it can also work with LazyOutputFormat if desired (as well as just about anything else you might choose to do with an OutputFormat).
LazyOutputFormat.setOutputFormatClass(job, BytesValueOutputFormat.class);

In our case, this the last step in our sequence of Hadoop jobs so we had no further need for the key. One could conceive of situations in which further manipulation is needed. In such cases, you could attempt some sort of delimited binary (to separate the key from the value), but it might be easier to just keep it all as text and use Common Codec's Base64 to pass the bytes value between jobs.

Tuesday, March 12, 2013

Hadoop overwrite options

There's an undocumented feature (Hadoop's documentation needs some serious love) that allows you to overwrite the destination just like you can with Unix's cp -f in Hadoop's dfs in the cp and copyFromLocal commands.  I've added HADOOP-9381 with a patch to document this feature in both the help page and the web page.

While I was looking at this, I realized that the mv and moveFromLocal commands didn't recognize the -f option, even though Unix's mv command does. Since it was simple to add, I created HADOOP-9382 with a patch to address that issue.

Thursday, January 3, 2013

Meld Windows Installer

I'm a huge fan of Meld as a diffing / merging tool.  It's the nicest looking, most powerful tool of it's kind that I've come across.  (Although I'd also like to put a shout out to KDiff3 for being able to handle larger files than anything else I've seen besides GNU Diff, and to WinMerge as another popular choice).  The trouble is, if you run Windows, it's a bit of a pain to set up.  You have to install Python, GTK+ and PyGTK (made easier by the nice all-in-one installer they now have), then Meld itself, and finally write a script to launch the appropriate Python command and create shortcuts to that script for convenience.  This is a shame because I'd really like to get more Windows users using this wonderful tool.
So I made an installer than includes all of these and has no extra dependencies needed.  Just install and go.  The only thing it doesn't currently have is support for syntax highlighting (needs PyGtkSourceView, which is not included in Portable Python, which I used), and VCS browsing (needs GNU Patch via Cygwin).  Here's where you can get it: http://code.google.com/p/meld-installer/.  Lemme know if you find any mistakes, it's my first time using NSIS.

Edit: I've now also created a portable .zip archive.

Edit: Vote here to ask for PyGtkSourceView to be added to Portable Python, so I can use it (edit: it appears PyGTK in Portable Python already includes PyGtkSourceView).  I'll look into doing this without Portable Python (though that'll be easier for me) and possible GNU Patch as well, but am not sure how hard it will be.

Edit: I've made a significant update to this.  My thanks to Angel Ezquerra of TortoiseHG for his suggestions and testing assistance.

Tuesday, May 15, 2012

A Mercurial Userscript

I've got a minor complaint against the Mercurial web interface, and that is that when you are browsing around using the tip version, links will point to the nodeid tip points to rather than keeping the URLs with the relative tip version.  I think the way it should work is that you stay on the tip, unless you are click into a revision log (so that you can click specific revisions).  This is especially nice for sharing a link to the latest with someone, you would want to keep that link relative.
For the most part, I don't really care what Mercurial does, but since I do use their web interface because it's the SCM of choice for the OpenJDK project, I thought I'd whip up a userscript to remedy the situation.  Check out the script here, then test its effects by browsing the latest JDK sources:
http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/
http://hg.openjdk.java.net/jdk7/jdk7-gate/jdk/file/tip/src/share/classes/

Let me know if you have any problems or suggestions for improvement.

Wednesday, December 28, 2011

Running Lego Mindstorms RIS 2.0 on newer windows

While getting my old Lego Mindstorms (an RCX 2.0 setup) set up for my little brother to play with, I learned some thigns about using it with newer versions of Windows (Windows 7 in my case). I found out here that there is a patch needed for the USB IR tower, which resolves the problem of the system locking up when the tower is plugged in. You can download it here: http://cache.lego.com/downloads/education/tower164.zip.
After installing this, things will run fine, unless you decide to use the official program and launch it more than once. At which point you will see a message like "A critical error has occurred. You may be running out of memory, or you may need to reinstall Robotics Invention System 2.0." The solution to this is a bit messy. There is a file left behind here
Vista/Win7: C:\Users\<user>\AppData\Local\Temp\Ris 2.0.mov.#res
XP: C:\Documents and Settings\<user>\Application Data\Temp\Ris 2.0.mov.#res
This file must be deleted each time before the program is launched. What I did was create a batch script to do this, and a Visual Basic script to launch the batch script (so it could happen without launching a command prompt window).
You'll find both scripts here: https://gist.github.com/1531705
I recommend replacing all shortcuts to theLaunchRis2.exe with a shortcut to the Visual Basic script. Note that the VB script assumes you will put put the batch script in %PROGRAMFILES(X86)%\LEGO MINDSTORMS\launchRis.bat (or %PROGRAMFILES%\LEGO MINDSTORMS\launchRis.bat on a 32 bit system), but you can easily edit the script to change the location.
Of course, this second problem is irrelevant if you decide to write the programs youself (I preferred this over the graphical tool Lego provided). There are several languages available.
Let me know if you have suggestions for improvement or run into any issues.

Thursday, September 1, 2011

A userscript for Viewvc

I've posted a userscript I've written as a workaround for a request for ViewVC that hasn't been done yet (despite the fact a patch has already been submitted).  The missing feature is a link to the log view for directories.

A few Autohotkey scripts

I decided to post the source for a few Autohotkey scripts a couple of days ago:

  • GW Tonic Bot : This one is a bot to drink tonics for you in Guild Wars to help you get your Party Animal title. It maps back and forth between your Guild Hall and drinks 2 tonics (yes, 2 tonics because of a bug in GW) each time it stops.
  • GW Drunkard Bot : This one was meant to drink alcohol at specific intervals to achieve optimal points towards your Drunkard title.  This was made obsolete by the March 3, 2011 update.  You can now click them (or have an autoclicker click them) as fast as you want.
  • Generic Autoclicker : This one can be for any kind of automated clicking, it just clicks (or double clicks) where you tell it to at an interval you specify.
Note that I haven't tested them in IronAHK, only Autohotkey on Windows.  Also note that it's also easy to convert these to an .exe file if you don't want to install Autohotkey.  And if a non-techie is reading this, I'm happy to provide that for you.  Just leave a comment below.

Also feel free to comment (or fork) either of these scripts if you have improvements.  Some people have found that the delay between mappings in the tonic bot is not long enough for slower internet connections and so you have to tell it to drink more tonics than you'd think (since some clicks will be wasted).

Friday, December 17, 2010

Maven 3 & profiles.xml

There are some pretty cool things in Maven 3 (although mixins and global dependency exclusions have been tabled until 3.1). Matt Raible talks about some of them here. Significant points include dramatically increased performance (50% to 400% faster), unspecified plugin versions will pull the latest version and not snapshot versions (though it's best to be explicit about plugin versions), also Sonatype has developed Maven Shell and polyglot Maven to work with Maven 3.

There are, however, some compatibility concerns when moving from Maven 2.x. And not all plugins work yet. (Most notably they're refactoring the Site Plugin so it's not completely working yet). But the Maven team intends the new release to be usable as a drop-in replacement for Maven 2 (though this won't be the case for Maven 3.1).

Most changes for compatibility seemed pretty trivial. The biggest thing I see preventing Maven 3 from being a drop-in replacement for Maven 2 was their decision to remove support for profiles.xml. This was documented in MNG-4060, though not much discussed there. A few people have complained about this already here and it was discussed a little here. I haven't seen much justification for this, other than it's supposed to be difficult to test. Though I'm not sure why it'd be much harder to test than the use of profiles in the pom.xml. But the Maven team seems for whatever reason fairly committed to this idea, despite one of their committers (Milos Kleint) disagreeing with their position.

People use profiles for a variety of things. Where I work, they are commonly used for environment settings. For example, which version of a particular web service or database to use in prod, qa, etc. But also for which version of that service or database to use for a particular developer's sandbox (another common difference between developers is their log level). These typically use Maven filtering in conjunction with external profiles to accomplish this. These profile properties are also kept in our SCM so if something used by all developer's sandbox, it is easily changeable and transparent to all developers on the project. It is currently impossible to run Maven 3 in this way.

We are not entirely without options when it comes to addressing this, but none of the solutions in my opinion is as nice as the stable, built-in profiles.xml feature. Recently, I've asked if there is some other mechanism I should be using to accomplish this. To date, I've not seen anyone fully explore this issue that I think is important for Maven 3 going forward. So that's what I did here. If you'd like any of the sample projects I created for this exploration, just drop me a line. Maybe I'll put them up on GitHub or something at some point. So without further ado, here are the options I've found and their pros & cons.

1. Fork Maven and put the feature back in
Maybe it's my cynical nature, but this was actually the first thing that came to my mind. But I don't know that very many people would feel comfortable running a patched version of Maven. Plus, I'd want to keep forking them to keep getting all the other goodies they add, which makes a lot of work for me. Then I thought of submitting a patch to Maven for this. But when I found out the decision was pretty deliberate and not simply a lack of resources, I backed off that idea.
Pros Cons
It's the way Maven should be A lot of work
Who's brave enough to use it?

2. Stick with Maven 2
Hey, there's nothing wrong with being old fashioned. There's strong logic to the "If it ain't broke, don't fix it" argument. Some people are even still happily on the 2.0.x branch rather than the 2.2.x branch.
Pros Cons
The least work of any other solution No Maven 3 goodies

3. Place all profiles properties in the POM
POMs in Maven 3 still use a <modelVersion>4.0.0</modelVersion>. There for you could put everything inside it like
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>org.foo</groupId>
  <artifactId>pomProfileTest</artifactId>
  <version>1.0</version>

  <properties>
    <!-- these are defaults, they can be overridden with a settings.xml -->
    <javaVersion>1.5</javaVersion>
    <junitVersion>4.8.2</junitVersion>
    <sourceEncoding>UTF-8</sourceEncoding>
    <resourceEncoding>UTF-8</resourceEncoding>
    <profile>dev</profile>
    <prop1>null</prop1>
    <prop2>null</prop2>
  </properties>

  <profiles>
    <profile>
      <id>developer1</id>
      <properties>
        <prop1>prop1Value</prop1>
        <prop2>prop2Value</prop2>
      </properties>
    </profile>
    <profile>
      <id>developer2</id>
      <properties>
        <prop1>anotherProp1Value</prop1>
        <prop2>anotherProp2Value</prop2>
      </properties>
    </profile>
  </profiles>
<!-- ... -->
</project>
This could then be invoked in the standard way:
mvn -P developer1 <goal>
Pros Cons
Simple to implement: scripts can continue to invoke Maven in the current way Clutters POM, particularly troublesome since the POM is also deployed.
Also works with Maven 2
Variations
1. Multiple pom.xml files could be checked into the project to cut down on the clutter inside the main pom.xml.

4. Place all an environment's (or developer's) settings in a single settings.xml
This seems to be Maven's official answer on the subject. For things that are common across many projects, this might be a decent solution. But for the many project specific settings (e.g. a db.url property), you'd have to make sure they are named uniquely across all projects so as not to conflict with each other. This makes for a real maintenance problem.
As one commenter in the mailing list noted, some developers have a fear of changing their settings.xml (even if they should be comfortable with this) and would prefer to simply change a few properties. But this should make us pause to make sure there's nothing we're using profiles.xml for when we should actually be using settings.xml.
Pros Cons
Maven's official solution Uniquely named properties causes maintenance issues
Also works with Maven 2 Changes not normally visible to all developers

5. Use separate settings.xml files
You can specify a different user settings file with
mvn -s path/someSettingsFile.xml <goal>
and create a file like
developer1.xml
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                      http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <profiles>
    <profile>
      <id>developer1</id>
      <properties>
        <!-- global properties -->
        <siteLocation>file://${user.dir}/site</siteLocation>

        <!-- project specific properties -->
        <prop1>prop1Value</prop1>
        <prop2>prop2Value</prop2>
      </properties>
      <!-- ... -->
    <profile>
  <profiles>
</settings>

Pros Cons
Little profile clutter in POM Massive clutter & overhead in profile settings file, as all your normal settings.xml things are now in each profile settings file
Checking all these settings files in allows developers to make sure they're all using and deploying to the same repos profile settings are machine-dependent since <localRepository/> is also in settings file.
Also works with Maven 2

6. Use the Properties Maven Plugin
The Properties Maven Plugin allows for properties files to be loaded (and saved) just as if you had used <properties/> in the pom.xml itself.
Pros Cons
Properties are cleanly externalized Plugin is considered an alpha version
Also works with Maven 2

Variations
1. Use profiles to select the properties file
This could be invoked in the standard way:
mvn -P developer1 <goal>
And your pom might look like
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>org.foo</groupId>
  <artifactId>propertiesProfileTest</artifactId>
  <version>1.0</version>

  <properties>
    <!-- these are defaults, they can be overridden with a settings.xml -->
    <javaVersion>1.5</javaVersion>
    <junitVersion>4.8.2</junitVersion>
    <sourceEncoding>UTF-8</sourceEncoding>
    <resourceEncoding>UTF-8</resourceEncoding>
    <profile>dev</profile>
    <prop1>null</prop1>
    <prop2>null</prop2>
  </properties>

  <profiles>
    <profile>
      <id>developer1</id>
      <properties>
        <profile>developer1.properties</profile>
      </properties>
    </profile>
    <profile>
      <id>developer2</id>
      <properties>
        <profile>developer2.properties</profile>
      </properties>
    </profile>
  </profiles>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>properties-maven-plugin</artifactId>
        <version>1.0-alpha-2</version>
        <executions>
          <execution>
            <phase>initialize</phase>
            <goals>
              <goal>read-project-properties</goal>
            </goals>
            <configuration>
              <files>
                <file>${project.basedir}/filters/${profile}</file>
              </files>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  <!-- ... -->
</project>
Pros Cons
Standard way for selecting profile POM now is cluttered with mapping profiles to properties files

2. Use a variable to select the properties file
In my sample, the invocation would be
mvn -Dprofile=developer1 <goal>
And the POM would be like
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>org.foo</groupId>
  <artifactId>propertiesProfileTest</artifactId>
  <version>1.0</version>

  <properties>
    <!-- these are defaults, they can be overridden with a settings.xml -->
    <javaVersion>1.5</javaVersion>
    <junitVersion>4.8.2</junitVersion>
    <sourceEncoding>UTF-8</sourceEncoding>
    <resourceEncoding>UTF-8</resourceEncoding>
    <profile>dev</profile>
    <prop1>null</prop1>
    <prop2>null</prop2>
  </properties>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>properties-maven-plugin</artifactId>
        <version>1.0-alpha-2</version>
        <executions>
          <execution>
            <phase>initialize</phase>
            <goals>
              <goal>read-project-properties</goal>
            </goals>
            <configuration>
              <files>
                <file>${project.basedir}/filters/${profile}.properties</file>
              </files>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  <!-- ... -->
</project>
Pros Cons
Little profile clutter in POM No standard way of selecting profile

Conclusion
I still really wish the Maven folks would change their mind on this. But until then (unless you're sticking with Maven 2 -- which might not be a bad idea at least for now because of plugin incompatibilities), the best option seems to me to use the Maven Properties Plugin, either with a property or with a profile. (Personally, I'm leaning to the profile option beacuse I think the fact it can be invoked in a standard way is worth the extra POM clutter). Though it is technically an alpha version, it seemed to work fine for me and I believe Maven's decision to remove support for profiles.xml will cause people to flock to this plugin, and therefore likely to only increase its stability. Of course, IDE support for this practice is another question.

Conclusion Part 2
I do feel obligated to mention that the practice of using Maven filtering for things like database URLs where I work is actually changing to use runtime properties instead. (Generally by building a property reader class using java.util.Properties to read different properties files based on the name passed with -Denv=<environmentName>). This has the advantage (besides working with Maven 3) of not requiring separate deployments just for environment differences or redeploys for changes to an environment property.
I still think removing profiles.xml support is bad since the intent was to keep backward compatibility, thus making pointing my M2_HOME to Maven 3 a bit painful when working on old and new projects. It also seems a bit strange that they removed support for profiles.xml when they but not for profiles. It was nice to be able to have those external to the POM.
However, for my (and probably the gentleman on Nabble also) use case, Maven filtering with profiles probably may not have been the right idea in the first place.

Wednesday, July 14, 2010

I opened a new IntelliJ issue

I opened an issue for IntelliJ IDEA to highlight TODOs in blue in gsp files like they do for other files tonight. Vote if you're interested.