即使在命令行上告知-D mapred.reduce.tasks = 0之后,hadoop也会减少任务运行
我有一个MapReduce
程序
public static class MapClass extends MapReduceBase implements Mapper { private final static IntWritable uno = new IntWritable(1); private IntWritable citationCount = new IntWritable(); public void map(Text key, Text value, OutputCollector output, Reporter reporter) throws IOException { citationCount.set(Integer.parseInt(value.toString())); output.collect(citationCount, uno); } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(IntWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int count = 0; while (values.hasNext()) { count += values.next().get(); } output.collect(key, new IntWritable(count)); } }
我想只运行map
任务,其中输出应该是的forms当我从命令行运行时我说
$ hadoop jar Hadoop-programs.jar com/hadoop/patent/CitationHistogram input output -Dmapred.reduce.tasks=0
但这是我在命令行输出中看到的
12/07/30 06:13:14 INFO mapred.JobClient: map 50% reduce 0% 12/07/30 06:13:23 INFO mapred.JobClient: map 58% reduce 0% 12/07/30 06:13:26 INFO mapred.JobClient: map 60% reduce 8% 12/07/30 06:13:29 INFO mapred.JobClient: map 68% reduce 8% 12/07/30 06:13:32 INFO mapred.JobClient: map 76% reduce 8% 12/07/30 06:13:35 INFO mapred.JobClient: map 85% reduce 16% 12/07/30 06:13:38 INFO mapred.JobClient: map 93% reduce 16% 12/07/30 06:13:41 INFO mapred.JobClient: map 98% reduce 16% 12/07/30 06:13:44 INFO mapred.JobClient: map 100% reduce 16% 12/07/30 06:13:55 INFO mapred.JobClient: map 100% reduce 69% 12/07/30 06:13:58 INFO mapred.JobClient: map 100% reduce 78% 12/07/30 06:14:01 INFO mapred.JobClient: map 100% reduce 94% 12/07/30 06:14:08 INFO mapred.JobClient: map 100% reduce 100%
当我看到作业的输出时,我看到像这样的条目
1 2 13 2 24 1 29 1 31 2 42 3 6796 7 6799 1 6806 1 6815 1 6824 2
这意味着数据正在聚合
我怎么能不运行减速机呢?
只有当您实现ToolRunner.run方法并在main方法中传递参数时,这才有效。
ToolRunner.run(new Configuration(), new YourClasImplmentingToolRunner(), args);
如果您不想尝试设置
job.setNumReduceTasks(0);
或另一种选择是在conf中设置值并在job中使用该配置。
Configuration conf = new Configuration(); conf.set("mapred.reduce.tasks", "0"); Job job = new Job(conf, "My job Name");
在-D选项后添加一个空格,它应该工作;)
hadoop jar Hadoop-programs.jar com/hadoop/patent/CitationHistogram input output -D mapred.reduce.tasks=0
- 如何使用java api直接发送hbase shell命令,如jdbc?
- 匿名类上的NotSerializableException
- java.sql.SQLException:没有为jdbc找到合适的驱动程序:hive:// localhost:10000 / default
- 使用Hiveserver2 Thrift Java客户端时请求挂起
- Hadoop Writable和java.io.serialization之间有什么联系和区别?
- 使用map-reduce构建分布式KD树
- Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法强制转换为org.apache.hadoop.io.Text
- 在Hadoop中,框架在正常的Map-Reduce应用程序中保存Map任务的输出?
- 从map中键入不匹配的值:expected org.apache.hadoop.io.NullWritable,recieved org.apache.hadoop.io.Text