ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner
我对Giraph比较陌生,我正在努力让我的Giraph edit-compile-deploy循环适用于我们的代码。 我能够运行各种灵感来自http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/的例子,但我坚持不懈运行我的SimpleShortestPathsVertex Giraph示例的修改版本时出现ClassNotFoundException。 我已经尝试过-libjars和HADOOP_CLASSPATH的各种组合,但我没有想法,我真的很感谢你的帮助。 细节如下。
版本
- Hadoop:Hadoop 2.0.0-cdh4.4.0
- Giraph:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
PageRankBenchmark运行正常
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \ org.apache.giraph.benchmark.PageRankBenchmark \ -Dgiraph.zkList=:2181 \ -e 1 -s 3 -v -V 50 -w 1 ... 14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015 ... (full output is below)
GiraphRunner SimpleShortestPathsVertex也运行正常
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ -Dgiraph.zkList=:2181 \ org.apache.giraph.examples.SimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op goutput/shortestpathsC2 \ -ca SimpleShortestPathsVertex.source=2 \ -w 1 ... 14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017 ... (full output is below)
奖励:结果是正确的:
$ hadoop fs -cat goutput/shortestpathsC2/p* 0 1.0 2 2.0 1 0.0 3 1.0 4 5.0
但是我的SimpleShortestPathsVertex的修改版本获得了ClassNotFoundException
包含修改后的顶点(KdlSimpleShortestPathsVertex,没有包)的jar是可以的:
$ jar -tf ~/kdl_hadoop_play.jar META-INF/MANIFEST.MF KdlSimpleShortestPathsVertex.class META-INF/
但我的奔跑呕吐:
$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ -Dgiraph.zkList=:2181 \ -libjars ~/kdl_hadoop_play.jar \ KdlSimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip /user/cornell/ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op /user/cornell/goutput/shortestpathsC2 \ -ca KdlSimpleShortestPathsVertex.source=2 \ -w 1 Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210) at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147) at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
我最好的猜测……
…环顾四周后,GiraphRunner可能没有正确处理-libjars,正如http://grepalex.com/2013/02/25/hadoop-libjars/暗示的那样(“确保你的代码使用的是GenericOptionsParser” )。 浏览Giraph源代码,我看不到该类访问过。 我尝试将HADOOP_CLASSPATH设置为我的jar,但这并没有解决问题。
任何帮助都是极好的!
PageRankBenchmark输出
14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4) 14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything 14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015 14/08/01 11:42:30 INFO mapred.JobClient: map 0% reduce 0% 14/08/01 11:42:40 INFO mapred.JobClient: map 50% reduce 0% 14/08/01 11:42:41 INFO mapred.JobClient: map 100% reduce 0% 14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015 14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39 14/08/01 11:42:44 INFO mapred.JobClient: File System Counters 14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes read=0 14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes written=369846 14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of read operations=0 14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of large read operations=0 14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of write operations=0 14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes read=88 14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes written=0 14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of read operations=2 14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of large read operations=0 14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of write operations=1 14/08/01 11:42:44 INFO mapred.JobClient: Job Counters 14/08/01 11:42:44 INFO mapred.JobClient: Launched map tasks=2 14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15772 14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/08/01 11:42:44 INFO mapred.JobClient: Map-Reduce Framework 14/08/01 11:42:44 INFO mapred.JobClient: Map input records=2 14/08/01 11:42:44 INFO mapred.JobClient: Map output records=0 14/08/01 11:42:44 INFO mapred.JobClient: Input split bytes=88 14/08/01 11:42:44 INFO mapred.JobClient: Spilled Records=0 14/08/01 11:42:44 INFO mapred.JobClient: CPU time spent (ms)=2230 14/08/01 11:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=411357184 14/08/01 11:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2428895232 14/08/01 11:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=806027264 14/08/01 11:42:44 INFO mapred.JobClient: Giraph Stats 14/08/01 11:42:44 INFO mapred.JobClient: Aggregate edges=50 14/08/01 11:42:44 INFO mapred.JobClient: Aggregate finished vertices=50 14/08/01 11:42:44 INFO mapred.JobClient: Aggregate vertices=50 14/08/01 11:42:44 INFO mapred.JobClient: Current master task partition=0 14/08/01 11:42:44 INFO mapred.JobClient: Current workers=1 14/08/01 11:42:44 INFO mapred.JobClient: Last checkpointed superstep=0 14/08/01 11:42:44 INFO mapred.JobClient: Sent messages=0 14/08/01 11:42:44 INFO mapred.JobClient: Superstep=4 14/08/01 11:42:44 INFO mapred.JobClient: Giraph Timers 14/08/01 11:42:44 INFO mapred.JobClient: Input superstep (milliseconds)=238 14/08/01 11:42:44 INFO mapred.JobClient: Setup (milliseconds)=2903 14/08/01 11:42:44 INFO mapred.JobClient: Shutdown (milliseconds)=68 14/08/01 11:42:44 INFO mapred.JobClient: Superstep 0 (milliseconds)=77 14/08/01 11:42:44 INFO mapred.JobClient: Superstep 1 (milliseconds)=64 14/08/01 11:42:44 INFO mapred.JobClient: Superstep 2 (milliseconds)=45 14/08/01 11:42:44 INFO mapred.JobClient: Superstep 3 (milliseconds)=43 14/08/01 11:42:44 INFO mapred.JobClient: Total (milliseconds)=3442
SimpleShortestPathsVertex输出
14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one. 14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration 14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known 14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known 14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known 14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4) 14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017 14/08/01 11:47:39 INFO mapred.JobClient: map 0% reduce 0% 14/08/01 11:47:44 INFO mapred.JobClient: map 50% reduce 0% 14/08/01 11:47:45 INFO mapred.JobClient: map 100% reduce 0% 14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017 14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39 14/08/01 11:47:46 INFO mapred.JobClient: File System Counters 14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes read=0 14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes written=367068 14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of read operations=0 14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of large read operations=0 14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of write operations=0 14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes read=200 14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes written=30 14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of read operations=5 14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of large read operations=0 14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of write operations=2 14/08/01 11:47:46 INFO mapred.JobClient: Job Counters 14/08/01 11:47:46 INFO mapred.JobClient: Launched map tasks=2 14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=8538 14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/08/01 11:47:46 INFO mapred.JobClient: Map-Reduce Framework 14/08/01 11:47:46 INFO mapred.JobClient: Map input records=2 14/08/01 11:47:46 INFO mapred.JobClient: Map output records=0 14/08/01 11:47:46 INFO mapred.JobClient: Input split bytes=88 14/08/01 11:47:46 INFO mapred.JobClient: Spilled Records=0 14/08/01 11:47:46 INFO mapred.JobClient: CPU time spent (ms)=1590 14/08/01 11:47:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=341344256 14/08/01 11:47:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2363527168 14/08/01 11:47:46 INFO mapred.JobClient: Total committed heap usage (bytes)=504758272 14/08/01 11:47:46 INFO mapred.JobClient: Giraph Stats 14/08/01 11:47:46 INFO mapred.JobClient: Aggregate edges=12 14/08/01 11:47:46 INFO mapred.JobClient: Aggregate finished vertices=5 14/08/01 11:47:46 INFO mapred.JobClient: Aggregate vertices=5 14/08/01 11:47:46 INFO mapred.JobClient: Current master task partition=0 14/08/01 11:47:46 INFO mapred.JobClient: Current workers=1 14/08/01 11:47:46 INFO mapred.JobClient: Last checkpointed superstep=0 14/08/01 11:47:46 INFO mapred.JobClient: Sent messages=0 14/08/01 11:47:46 INFO mapred.JobClient: Superstep=4 14/08/01 11:47:46 INFO mapred.JobClient: Giraph Timers 14/08/01 11:47:46 INFO mapred.JobClient: Input superstep (milliseconds)=181 14/08/01 11:47:46 INFO mapred.JobClient: Setup (milliseconds)=313 14/08/01 11:47:46 INFO mapred.JobClient: Shutdown (milliseconds)=128 14/08/01 11:47:46 INFO mapred.JobClient: Superstep 0 (milliseconds)=57 14/08/01 11:47:46 INFO mapred.JobClient: Superstep 1 (milliseconds)=54 14/08/01 11:47:46 INFO mapred.JobClient: Superstep 2 (milliseconds)=36 14/08/01 11:47:46 INFO mapred.JobClient: Superstep 3 (milliseconds)=35 14/08/01 11:47:46 INFO mapred.JobClient: Total (milliseconds)=805
好的,看完hadoop脚本以及Hadoop和Giraph源代码后,我想我已经明白了。 大提示来自于使用带有Hadoop的libjars选项以及输出中的这一行:
WARN mapred.JobClient:使用GenericOptionsParser解析参数。 应用程序应该实现相同的工具。
原因似乎是GiraphRunner使用自己的ConfigurationUtils.parseArgs()来获取org.apache.commons.cli.CommandLine,而不是使用推荐的org.apache.hadoop.util.GenericOptionsParser.getCommandLine()来表示’ libjars的选择。 这让我回到了Hadoop的通用类路径处理工具:CLASSPATH和/或HADOOP_CLASSPATH。 这是有效的:
- 使用冒号分隔符设置HADOOP_CLASSPATH以包含应用程序jar 和 gigraph核心jar。
- 传递-libjars使用相同的类路径但使用逗号分隔符。
例如,在我的机器上:
$ export GIRAPH_HOME=/share/apps/giraph $ export HADOOP_CLASSPATH=/home//kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH $ export LIBJARS=/home/ /kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar $ hadoop fs -rm -R goutput/shortestpathsC2 $ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \ -Dgiraph.zkList=:2181 \ -libjars ${LIBJARS} \ KdlSimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip /user/cornell/ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op /user/cornell/goutput/shortestpathsC2 \ -ca SimpleShortestPathsVertex.source=2 \ -w 1 ... $ hadoop fs -cat goutput/shortestpathsC2/p*
这给出了预期的输出和结果。
更一般地说,如果Giraph团队改变代码以使用(显然)更标准的解析器,将会很有帮助。
希望有所帮助!
我不知道为什么这不起作用,但有一种快速和肮脏的方法来解决这个问题。 尝试将代码放在giraph-examples/src/main/java/org/apache/giraph/examples/
目录中(SimpleShortestPath所在的位置)。 然后通过运行mvn -DskipTests --projects giraph-examples --also-make package
构建giraph-examples jar。 然后只需像运行SimpleShortestPath一样运行程序,用文件名替换SimpleShortestPath。 我希望有所帮助。
- ClassNotFoundException启动Tomcat时的DispatcherServlet(Maven依赖项未复制到wtpwebapps)
- 问题 – java.lang.ClassNotFoundException:org.springframework.web.servlet.DispatcherServlet
- classNotFound在eclipse中在Tomcat上使用JDBC进行MYSQL时出现exception
- 如何设置rmiregistry使用的类路径?
- 运行时错误:java.lang.ClassNotFoundException:com.mysql.jdbc.Driver
- ClassNotFoundException oracle.jdbc.driver.OracleDriver仅在servlet中使用Eclipse
- Java错误:EventDispatchThread.run()行:不可用
- org.h2.Driver的java ClassNotFoundException
- classpath – 从命令行运行java程序