不使用JobConf运行Hadoop作业

我找不到提交不使用已弃用的JobConf类的Hadoop作业的单个示例。 JobClient ，尚未弃用，仍然只支持采用JobConf参数的方法。

有人可以指出我只使用Configuration类（而不是JobConf ）并使用mapreduce.lib.input包而不是mapred.input提交Hadoop map / reduce作业的Java代码示例吗？

希望这有用

 import java.io.File; import org.apache.commons.io.FileUtils; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MapReduceExample extends Configured implements Tool { static class MyMapper extends Mapper { public MyMapper(){ } protected void map( LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws java.io.IOException, InterruptedException { context.getCounter("mygroup", "jeff").increment(1); context.write(key, value); }; } @Override public int run(String[] args) throws Exception { Job job = new Job(); job.setMapperClass(MyMapper.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); return 0; } public static void main(String[] args) throws Exception { FileUtils.deleteDirectory(new File("data/output")); args = new String[] { "data/input", "data/output" }; ToolRunner.run(new MapReduceExample(), args); } }

我相信本教程说明了使用Hadoop 0.20.1删除已弃用的JobConf类。

这是一个可下载代码的一个很好的例子： http ： //sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html它还有两年多了，没有官方文档讨论新的API。伤心。

在之前的API中，有三种方式提交作业，其中一种方法是提交作业并获取对RunningJob的引用并获取RunningJob的id。

 submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.

如何使用新的Api并获取对RunningJob的引用并获取runningJob的id，因为没有api返回对RunningJob的引用

 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html

谢谢

尝试使用Configuration和Job 。这是一个例子：

（替换Mapper ， Combiner ， Reducer类和其他配置）

 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); if(args.length != 2) { System.err.println("Usage:  "); System.exit(2); } Job job = Job.getInstance(conf, "Word Count"); // set jar job.setJarByClass(WordCount.class); // set Mapper, Combiner, Reducer job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); /* Optional, set customer defined Partioner: * job.setPartitionerClass(MyPartioner.class); */ // set output key job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // set input and output path FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); // by default, Hadoop use TextInputFormat and TextOutputFormat // any customer defined input and output class must implement InputFormat/OutputFormat interface job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

不使用JobConf运行Hadoop作业

从Java写入HDFS，“只能复制到0个节点而不是minReplication”

使用MultithreadMapper替换Mapper时，键入地图中的键不匹配

看起来好像你正在使用JRE运行sqoop – 但JAVA_HOME设置为JDK

不推荐使用$ HADOOP_HOME，Hadoop

在HADOOP地图中使用generics可以减少问题

将数据附加到HDFS Java中的现有文件

从System读取文本文件到Hbase MapReduce

使用Java API将Parquet格式写入HDFS，而不使用Avro和MR

如何导入org.apache Java依赖项w /或没有Maven

Hadoop伪分布式操作错误：协议消息标记具有无效的线路类型