输出文件包含Mapper输出而不是Reducer输出

嗨我试图在独立模式下使用map reduce技术找到少数数字的平均值。 我有两个输入文件。它包含值file1: 25 25 25 25 25和file2: 15 15 15 15 15

我的程序运行正常,但输出文件包含mapper的输出而不是reducer输出。

这是我的代码:

 import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.io.Writable; import java.io.*; public class Average { public static class SumCount implements Writable { public int sum; public int count; @Override public void write(DataOutput out) throws IOException { out.writeInt(sum); out.writeInt(count); } @Override public void readFields(DataInput in) throws IOException { sum = in.readInt(); count =in.readInt(); } } public static class TokenizerMapper extends Mapper{ private final static IntWritable valueofkey = new IntWritable(); private Text word = new Text(); SumCount sc=new SumCount(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); int sum=0; int count=0; int v; while (itr.hasMoreTokens()) { word.set(itr.nextToken()); v=Integer.parseInt(word.toString()); count=count+1; sum=sum+v; } word.set("average"); sc.sum=sum; sc.count=count; context.write(word,sc); } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values,Context context) throws IOException, InterruptedException { int sum = 0; int count=0; int wholesum=0; int wholecount=0; for (SumCount val : values) { wholesum=wholesum+val.sum; wholecount=wholecount+val.count; } int res=wholesum/wholecount; result.set(res); context.write(key, result ); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, ""); job.setJarByClass(Average.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(SumCount.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } 

运行程序后,我的输出文件是这样的:

 average Average$SumCount@434ba039 average Average$SumCount@434ba039 

您不能将Reducer类IntSumReducer用作组合器。 组合器必须接收和发出相同的键/值类型。

所以我会删除job.setCombinerClass(IntSumReducer.class);

请记住,combine的输出是reduce的输入,因此写出TextIntWritable是行不通的。

如果您的输出文件看起来像part-m-xxxxx那么上述问题可能意味着它只运行Map阶段并停止。 你的柜台会证实这一点。

您还有Reducer ,它应该是Reducer