不使用JobConf运行Hadoop作业

我找不到提交不使用已弃用的JobConf类的Hadoop作业的单个示例。 JobClient ,尚未弃用,仍然只支持采用JobConf参数的方法。

有人可以指出我只使用Configuration类(而不是JobConf )并使用mapreduce.lib.input包而不是mapred.input提交Hadoop map / reduce作业的Java代码示例吗?

希望这有用

 import java.io.File; import org.apache.commons.io.FileUtils; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MapReduceExample extends Configured implements Tool { static class MyMapper extends Mapper { public MyMapper(){ } protected void map( LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws java.io.IOException, InterruptedException { context.getCounter("mygroup", "jeff").increment(1); context.write(key, value); }; } @Override public int run(String[] args) throws Exception { Job job = new Job(); job.setMapperClass(MyMapper.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); return 0; } public static void main(String[] args) throws Exception { FileUtils.deleteDirectory(new File("data/output")); args = new String[] { "data/input", "data/output" }; ToolRunner.run(new MapReduceExample(), args); } } 

我相信本教程说明了使用Hadoop 0.20.1删除已弃用的JobConf类。

这是一个可下载代码的一个很好的例子: http : //sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html它还有两年多了,没有官方文档讨论新的API。 伤心。

在之前的API中,有三种方式提交作业,其中一种方法是提交作业并获取对RunningJob的引用并获取RunningJob的id。

 submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions. 

如何使用新的Api并获取对RunningJob的引用并获取runningJob的id,因为没有api返回对RunningJob的引用

 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html 

谢谢

尝试使用ConfigurationJob 。 这是一个例子:

(替换MapperCombinerReducer类和其他配置)

 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); if(args.length != 2) { System.err.println("Usage:  "); System.exit(2); } Job job = Job.getInstance(conf, "Word Count"); // set jar job.setJarByClass(WordCount.class); // set Mapper, Combiner, Reducer job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); /* Optional, set customer defined Partioner: * job.setPartitionerClass(MyPartioner.class); */ // set output key job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // set input and output path FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); // by default, Hadoop use TextInputFormat and TextOutputFormat // any customer defined input and output class must implement InputFormat/OutputFormat interface job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }