从System读取文本文件到Hbase MapReduce

我需要将文本文件中的数据加载到Map Reduce，我很多天都很乖，但我没有找到适合我工作的解决方案。是否有任何方法或类从系统读取text / csv文件并将数据存储到HBASE表中。对我来说真的非常紧急，任何人都可以帮助我了解MapReduce F / w。

对于从文本文件中读取，首先文本文件应该在hdfs中。您需要为作业指定输入格式和输出格式

Job job = new Job(conf, "example"); FileInputFormat.addInputPath(job, new Path("PATH to text file")); job.setInputFormatClass(TextInputFormat.class); job.setMapperClass(YourMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); TableMapReduceUtil.initTableReducerJob("hbase_table_name", YourReducer.class, job); job.waitForCompletion(true);

YourReducer应该扩展org.apache.hadoop.hbase.mapreduce.TableReducer

样本减速器代码

 public class YourReducer extends TableReducer { private byte[] rawUpdateColumnFamily = Bytes.toBytes("colName"); /** * Called once at the beginning of the task. */ @Override protected void setup(Context context) throws IOException, InterruptedException { // something that need to be done at start of reducer } @Override public void reduce(Text keyin, Iterable values, Context context) throws IOException, InterruptedException { // aggregate counts int valuesCount = 0; for (Text val : values) { valuesCount += 1; // put date in table Put put = new Put(keyin.toString().getBytes()); long explicitTimeInMs = new Date().getTime(); put.add(rawUpdateColumnFamily, Bytes.toBytes("colName"), explicitTimeInMs,val.toString().getBytes()); context.write(keyin, put); } } }

示例映射器类

 public static class YourMapper extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }

从System读取文本文件到Hbase MapReduce

找不到Hadoop命令

hadoop方法将输出发送到多个目录

看起来好像你正在使用JRE运行sqoop – 但JAVA_HOME设置为JDK

如何在Hadoop MapReduce中将Object设置为Map输出的值？

在Spark中计算RDD昂贵任务中的记录？

初始工作没有接受任何资源; 检查群集UI以确保工作人员已注册并具有足够的资源

ClassNotFoundException org.apache.mahout.math.VectorWritable

使用saveAsTextFile的Spark NullPointerException

使用hadoop和java命令执行map-reduce作业之间有什么区别

Trunk无法编译，因为在Eclipse下使用Hadoop时libprotoc已经过时了