如何构建/运行这个简单的Mahout程序而不会出现exception?
我想运行我在Mahout In Action中找到的代码:
package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.DenseVector; import org.apache.mahout.math.NamedVector; import org.apache.mahout.math.VectorWritable; public class SeqPrep { public static void main(String args[]) throws IOException{ List apples = new ArrayList(); NamedVector apple; apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple"); apples.add(apple); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path("appledata/apples"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class); VectorWritable vec = new VectorWritable(); for(NamedVector vector : apples){ vec.set(vector); writer.append(new Text(vector.getName()), vec); } writer.close(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf); Text key = new Text(); VectorWritable value = new VectorWritable(); while(reader.next(key, value)){ System.out.println(key.toString() + " , " + value.get().asFormatString()); } reader.close(); } }
我编译它:
$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java
我把它:
$ jar -cvf SeqPrep.jar -C myjavac/ .
现在我想在我的本地hadoop节点上运行它。 我试过了:
hadoop jar SeqPrep.jar org.help.SeqPrep
但我得到:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
所以我尝试使用libjars参数:
$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar
并得到了同样的问题。 我不知道还有什么可以尝试的。
我最终的目标是能够将hadoop fs上的.csv文件读入稀疏矩阵,然后将其乘以随机向量。
编辑:看起来像Razvan得到它(注意:请参阅下面的另一种方法,这样做不会弄乱你的hadoop安装)。 以供参考:
$ find /usr/local/hadoop-1.0.3/. |grep mah /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar
接着:
$hadoop jar SeqPrep.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
编辑:我试图这样做而不将mahoutjar子复制到hadoop lib /
$ rm /usr/local/hadoop-1.0.3/lib/mahout-*
然后当然:
hadoop jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
当我尝试mahout工作文件时:
$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
如果我尝试包含我制作的.jar文件:
$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar
编辑:显然我一次只能发送一个jar子到hadoop。 这意味着我需要将我制作的类添加到mahout核心作业文件中:
~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup ~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class . ~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class
接着:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
编辑:好的,现在我可以做到这一点,而不会搞乱我的hadoop安装。 我在之前的编辑中更新了.jar错误。 它应该是:
~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class
然后:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
您需要使用Mahout提供的“job”JAR文件。 它打包了所有依赖项。 您还需要将类添加到其中。 这就是所有Mahout示例的工作原理。 你不应该把Mahout jar放在Hadoop lib中,因为那种在Hadoop中“安装”程序太深了。
如果您将从https://github.com/tdunning/MiA存储库中获取示例代码,那么它包含可用于Maven的pom.xml
文件。 当您使用mvn package
编译代码时,它将在target
目录中创建mia-0.1-job.jar
– 此存档包含除Hadoop之外的所有依赖项,因此您可以在Hadoop集群上运行它而不会出现问题
org.apache.mahout mahout-math 0.7 org.apache.mahout mahout-collections 1.0
我做的是用我的jar和所有mahout jar文件设置HADOOP_CLASSPATH,如下所示。
export HADOOP_CLASSPATH = / home / xxx / my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0.jar :/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar中:/ opt / Cloudera公司/包裹/ CDH -4.3.0-1.cdh4.3.0.p0.22 / LIB /象夫/亨利马乌-例子-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0。 P0.22 / LIB /象夫/亨利马乌-例子-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout – 整合 – 0.7 cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0.jar
然后我就能运行hadoop com.mycompany.mahout.CSVtoVector iris / nb / iris1.csv iris / nb / data / iris.seq
因此,您必须在HADOOP_CLASSPATH中包含所有jar子和mahout jar,然后您可以运行您的类
hadoop
- Apache Hadoop setXIncludeAware UnsupportedOperationException
- 在Hadoop中,框架在正常的Map-Reduce应用程序中保存Map任务的输出?
- 如何将.txt / .csv文件转换为ORC格式
- 运行Hadoop时如何避免OutOfMemoryException?
- 从reducer访问映射器的计数器
- 在Loop之后,全局变量的值不会改变
- “hadoop namenode -format”返回java.net.UnknownHostException
- hadoop – map reduce任务和静态变量
- 为什么我们在Hadoop堆栈中需要ZooKeeper?