Tuesday, May 26, 2015

Guava Hadoop Classpath Issue

Blogging this because it was slightly too large for a tweet.  If you've got a stacktrace like

java.lang.NoClassDefFoundError: com/google/common/io/LimitInputStream
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:467) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)

You may find this problematic dependency tree

It seems Google has once again broken compatibility in Guava by removing LimitInputStream in Guava 15.  And while much of Hadoop (except the new versions which have upgraded their Guava version) are on an older version of Guva, the hadoop-auth module contains a newer version of Guava that most dependency management tools (aka Maven and Gradle) will choose over the older version.  Adding an exclusion for this transitive dependency should resolve this issue.