Tag: elastic map reduce

从Eclipse在AWS-EMR上运行MapReduce作业: 我在Eclipse中有WordCount MapReduce示例。我将它导出到Jar，并将其复制到S3。然后我在AWS-EMR上运行它。成功。然后，我阅读了这篇文章 – http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-common-programming-sample.html它展示了如何使用AWS-EMR Api来运行MapReduce作业。它仍假设您的MapReduce代码打包在Jar中。我想知道是否有一种方法可以直接在AWS-EMR上从Eclipse运行MapReduce代码，而无需将其导出到Jar。

错误：java.io.IOException：错误的值类：class org.apache.hadoop.io.Text不是类Myclass: 我有我的mapper和reducer如下。但我得到了一些奇怪的例外。我无法弄清楚为什么会抛出这种exception。 public static class MyMapper implements Mapper { @Override public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { Text text = new Text(“someText”) //process output.collect(text, infoObjeject); } } public static class MyReducer implements Reducer { @Override public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { […]

由于任务尝试无法报告状态600秒，因此减少失败。杀！解？: 作业的减少阶段失败：失败的Reduce任务超出了允许的限制。每个任务失败的原因是：任务尝试_201301251556_1637_r_000005_0无法报告状态600秒。杀！问题详情： Map阶段接收每个记录的格式：time，rid，data。数据格式为：data元素及其计数。例如：a，1b，4c，7对应于记录的数据。映射器为每个数据元素输出每个记录的数据。例如： key：（time，a，），val：（rid，data）key：（time，b，），val：（rid，data）key：（time，c，），val：（rid，data）每个reduce从所有记录中接收与相同密钥对应的所有数据。例如：key：（time，a），val：（rid1，data）和key：（time，a），val：（rid2，data）到达同一个reduce实例。它在这里进行一些处理并输出类似的rids。对于像10MB这样的小型数据集，我的程序运行没有问题。但是，当数据增加到1G时失败，出于上述原因。我不知道为什么会这样。请帮忙！减少代码：下面有两个类： VCLReduce0Split CoreSplit 一个。 VCLReduce0SPlit public class VCLReduce0Split extends MapReduceBase implements Reducer{ // @SuppressWarnings(“unchecked”) public void reduce (Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { String key_str = key.toString(); StringTokenizer stk = […]