使用hadoop运行jar时的NoSuchMethodError Sets.newConcurrentHashSet()

我正在使用带有hadoop 2.2.0的 cassandra-all 2.0.7 api。

 4.0.0 zazzercode doctorhere-engine-writer 1.0 jar DoctorhereEngineWriter  UTF-8 2.0.7 1.0-2 15.0 2.2.0     org.apache.maven.plugins maven-compiler-plugin 2.3.2  1.6 1.6    maven-assembly-plugin    zazzercode.DiseaseCountJob    jar-with-dependencies        junit junit 3.8.1 test   me.prettyprint hector-core ${hector.version}   org.apache.thrift libthrift     org.apache.cassandra cassandra-all ${cassandra.version}   libthrift org.apache.thrift     org.apache.cassandra cassandra-thrift ${cassandra.version}   libthrift org.apache.thrift     org.apache.hadoop hadoop-client ${hadoop.version}   org.apache.thrift libthrift 0.7.0   com.google.guava guava ${guava.version}   com.googlecode.concurrentlinkedhashmap concurrentlinkedhashmap-lru 1.3    

当我从hduser启动jar(在mvn assembly:assembly之后创建mvn assembly:assembly从普通用户prayagupd mvn assembly:assembly )时,如下所示

 hduser@prayagupd$ hadoop jar target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar /user/hduser/shakespeare 

我在cassandra api上得到以下番石榴收集错误,

 14/11/23 17:51:04 WARN mapred.LocalJobRunner: job_local800673408_0001 java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set; at org.apache.cassandra.config.Config.(Config.java:53) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:105) at org.apache.cassandra.hadoop.BulkRecordWriter.(BulkRecordWriter.java:105) at org.apache.cassandra.hadoop.BulkRecordWriter.(BulkRecordWriter.java:90) at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:69) at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:29) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:558) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:632) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445) 14/11/23 17:51:04 INFO mapreduce.Job: map 100% reduce 0% 

cassandra api的Config.java第53行有这段代码,

 public Set hinted_handoff_enabled_by_dc = Sets.newConcurrentHashSet(); 

然而,我发现了使用jar本身的Sets类,

  hduser@prayagupd$ jar tvf target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar | grep com/google/common/collect/Sets 2358 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$1.class 2019 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$2.class 1705 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$3.class 1327 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet$1.class 4224 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet.class 5677 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$DescendingSet.class 4187 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredNavigableSet.class 1567 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSet.class 2614 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSortedSet.class 1174 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$ImprovedAbstractSet.class 1361 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet$1.class 3727 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet.class 1398 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SetView.class 1950 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet$1.class 2058 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet.class 4159 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$UnmodifiableNavigableSet.class 17349 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets.class 

另外,当我检查jar如下时,存在方法,

 hduser@prayagupd$ javap -classpath target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar com.google.common.collect.Sets | grep newConcurrentHashSet public static  java.util.Set newConcurrentHashSet(); public static  java.util.Set newConcurrentHashSet(java.lang.Iterable); 

我在导航jar文件时看到/META/INF/maven库下的com.google.guava罐

我在hdfs用户外部的~/.m2有以下工件,

 $ ll ~/.m2/repository/com/google/guava/guava total 20 drwxrwxr-x 5 prayagupd prayagupd 4096 Nov 23 20:05 ./ drwxrwxr-x 4 prayagupd prayagupd 4096 Nov 23 20:05 ../ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 11.0.2/ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:06 15.0/ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 r09/ 

和hadoop classpath是

 $ hadoop classpath /usr/local/hadoop-2.2.0/etc/hadoop: /usr/local/hadoop2.2.0/share/hadoop/common/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/common/*: /usr/local/hadoop-2.2.0/share/hadoop/hdfs: /usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/hdfs/*: /usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/yarn/*: /usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*: /usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar 

依赖树如下所示, com.google.guava:guava:jar:r09:compileme.prettyprint:hector-core:jar:1.0-2:compile ,而hadoop-2.2.0使用guava-11.0.2.jar hadoop-2.2.0hadoop-2.6.0cassandra-2.0.6使用guava-15.0..jar

 $ find /usr/local/apache-cassandra-2.0.6/ -name "guava*" /usr/local/apache-cassandra-2.0.6/lib/guava-15.0.jar /usr/local/apache-cassandra-2.0.6/lib/licenses/guava-15.0.txt $ mvn dependency:tree [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building DoctorhereEngineWriter 1.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ doctorhere-engine-writer --- [INFO] zazzercode:doctorhere-engine-writer:jar:1.0 [INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile) [INFO] +- me.prettyprint:hector-core:jar:1.0-2:compile [INFO] | +- commons-lang:commons-lang:jar:2.4:compile [INFO] | +- commons-pool:commons-pool:jar:1.5.3:compile [INFO] | +- com.google.guava:guava:jar:r09:compile [INFO] | +- org.slf4j:slf4j-api:jar:1.6.1:compile [INFO] | +- com.github.stephenc.eaio-uuid:uuid:jar:3.2.0:compile [INFO] | \- com.ecyrd.speed4j:speed4j:jar:0.9:compile [INFO] +- org.apache.cassandra:cassandra-all:jar:2.0.7:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.5:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.2.0:compile [INFO] | +- com.ning:compress-lzf:jar:0.8.4:compile [INFO] | +- commons-cli:commons-cli:jar:1.1:compile [INFO] | +- commons-codec:commons-codec:jar:1.2:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.1:compile [INFO] | +- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3:compile [INFO] | +- org.antlr:antlr:jar:3.2:compile [INFO] | | \- org.antlr:antlr-runtime:jar:3.2:compile [INFO] | | \- org.antlr:stringtemplate:jar:3.2:compile [INFO] | | \- antlr:antlr:jar:2.7.7:compile [INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.2:compile [INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.2:compile [INFO] | +- jline:jline:jar:1.0:compile [INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1:compile [INFO] | +- com.github.stephenc.high-scale-lib:high-scale-lib:jar:1.1.2:compile [INFO] | +- org.yaml:snakeyaml:jar:1.11:compile [INFO] | +- edu.stanford.ppl:snaptree:jar:0.1:compile [INFO] | +- org.mindrot:jbcrypt:jar:0.3m:compile [INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile [INFO] | +- com.addthis.metrics:reporter-config:jar:2.1.0:compile [INFO] | | \- org.hibernate:hibernate-validator:jar:4.3.0.Final:compile [INFO] | | +- javax.validation:validation-api:jar:1.0.0.GA:compile [INFO] | | \- org.jboss.logging:jboss-logging:jar:3.1.0.CR2:compile [INFO] | +- com.thinkaurelius.thrift:thrift-server:jar:0.3.3:compile [INFO] | | \- com.lmax:disruptor:jar:3.0.1:compile [INFO] | +- net.sf.supercsv:super-csv:jar:2.1.0:compile [INFO] | +- log4j:log4j:jar:1.2.16:compile [INFO] | +- com.github.stephenc:jamm:jar:0.2.5:compile [INFO] | \- io.netty:netty:jar:3.6.6.Final:compile [INFO] +- org.apache.cassandra:cassandra-thrift:jar:2.0.7:compile [INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | +- commons-io:commons-io:jar:2.1:compile [INFO] | | +- commons-net:commons-net:jar:3.1:compile [INFO] | | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | +- org.apache.avro:avro:jar:1.7.4:compile [INFO] | | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | | \- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile [INFO] | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile [INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | | \- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile [INFO] | \- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] \- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] +- javax.servlet:servlet-api:jar:2.5:compile [INFO] \- org.apache.httpcomponents:httpclient:jar:4.0.1:compile [INFO] \- org.apache.httpcomponents:httpcore:jar:4.0.1:compile [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 27.124s [INFO] Finished at: Wed Mar 18 01:39:42 CDT 2015 [INFO] Final Memory: 15M/982M [INFO] ------------------------------------------------------------------------ 

这是hadoop 2.2.0的hadoop脚本,

 $ cat /usr/local/hadoop-2.2.0/bin/hadoop #!/usr/bin/env bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # This script runs the hadoop core commands. bin=`which $0` bin=`dirname ${bin}` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hadoop-config.sh export HADOOP_USER_CLASSPATH_FIRST=true function print_usage(){ echo "Usage: hadoop [--config confdir] COMMAND" echo " where COMMAND is one of:" echo " fs run a generic filesystem user client" echo " version print the version" echo " jar  run a jar file" echo " checknative [-a|-h] check native hadoop and compression libraries availability" echo " distcp   copy file or directories recursively" echo " archive -archiveName NAME -p  *  create a hadoop archive" echo " classpath prints the class path needed to get the" echo " Hadoop jar and the required libraries" echo " daemonlog get/set the log level for each daemon" echo " or" echo " CLASSNAME run the class named CLASSNAME" echo "" echo "Most commands print help when invoked w/o parameters." } if [ $# = 0 ]; then print_usage exit fi COMMAND=$1 case $COMMAND in # usage flags --help|-help|-h) print_usage exit ;; #hdfs commands namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3) echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2 echo "Instead use the hdfs command for it." 1>&2 echo "" 1>&2 #try to locate hdfs and if present, delegate to it. shift if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@" elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@" else echo "HADOOP_HDFS_HOME not found!" exit 1 fi ;; #mapred commands for backwards compatibility pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker) echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2 echo "Instead use the mapred command for it." 1>&2 echo "" 1>&2 #try to locate mapred and if present, delegate to it. shift if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@" elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@" else echo "HADOOP_MAPRED_HOME not found!" exit 1 fi ;; classpath) echo $CLASSPATH exit ;; #core commands *) # the core commands if [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell elif [ "$COMMAND" = "version" ] ; then CLASS=org.apache.hadoop.util.VersionInfo elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar elif [ "$COMMAND" = "checknative" ] ; then CLASS=org.apache.hadoop.util.NativeLibraryChecker elif [ "$COMMAND" = "distcp" ] ; then CLASS=org.apache.hadoop.tools.DistCp CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [ "$COMMAND" = "daemonlog" ] ; then CLASS=org.apache.hadoop.log.LogLevel elif [ "$COMMAND" = "archive" ] ; then CLASS=org.apache.hadoop.tools.HadoopArchives CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [[ "$COMMAND" = -* ]] ; then # class and package names cannot begin with a - echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'" exit 1 else CLASS=$COMMAND fi shift # Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" #make sure security appender is turned off HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}" export CLASSPATH=$CLASSPATH exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@" ;; esac 

这个谷歌collections问题怎么能修复?

这里的实际代码

 git clone --branch doctor-engine-writer https://github.com/prayagupd/doctorhere cd doctorhere/doctorhere-engine-writer 

参考

映射缩短时间的Hadoop库冲突

您基本上遇到了版本冲突。 问题是这样的,

  • hadoop本地库和cassandra都使用谷歌番石榴。
  • 但你的hadoop版本使用旧版本的番石榴(11.xx),而你的cassandra更新并使用番石榴16.0。 企业规模的hadoop设置在每个新版本中更新其环境并不常见。
  • cassandra配置加载程序使用newConcurrentHashSet()方法,这在旧版本中不存在。
  • hadoop使用的jar子总是在任何第三方jar子之前加载。 因此,即使你的“with dependencies”jar中存在正确版本的guava,也会从hadoop类路径加载旧版本的guava jar并分发给你的mappers / reducer。

解:

  • 在Job的run方法中将配置参数“mapreduce.job.user.classpath.first”设置为true:

     job.getConfiguration().set("mapreduce.job.user.classpath.first", "true"); 
  • 现在,在bin / hadoop中添加语句

     export HADOOP_USER_CLASSPATH_FIRST=true which will tell hadoop to load user defined libraries first. 
  • 确保您的库的最新版本存在于旧版本之前的hadoop类路径中。