MapReduce的方式进行HBase向HDFS导入和导出-低调大师

MapReduce的方式进行HBase向HDFS导入和导出

2017-11-19 540

附录代码:

HBase---->HDFS

 1 import java.io.IOException;
 2 
 3 import org.apache.hadoop.conf.Configuration;
 4 import org.apache.hadoop.fs.Path;
 5 import org.apache.hadoop.hbase.HBaseConfiguration;
 6 import org.apache.hadoop.hbase.client.Result;
 7 import org.apache.hadoop.hbase.client.Scan;
 8 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 9 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
10 import org.apache.hadoop.hbase.mapreduce.TableMapper;
11 import org.apache.hadoop.io.Text;
12 import org.apache.hadoop.mapreduce.Job;
13 import org.apache.hadoop.mapreduce.Mapper;
14 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
15 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
16 
17 public class HBase2HDFS {
18 
19     public static void main(String[] args) throws Exception {
20         Configuration conf = HBaseConfiguration.create();
21         Job job = Job.getInstance(conf, HBase2HDFS.class.getSimpleName());
22         job.setJarByClass(HBase2HDFS.class);
23         //MR有输入和输出,输入一般是FileInputFormat等...但是在HBase中需要用到一个特殊的工具类是TableMapReduceUtil
24         TableMapReduceUtil.initTableMapperJob(args[0], new Scan(), HBase2HDFSMapper.class,
25                                             Text.class, Text.class, job);
26         //HBase中的具体操作打到MR的job中.
27         TableMapReduceUtil.addDependencyJars(job);
28         job.setMapperClass(HBase2HDFSMapper.class);
29         job.setMapOutputKeyClass(Text.class);
30         job.setMapOutputValueClass(Text.class);
31         job.setOutputFormatClass(TextOutputFormat.class);
32         FileOutputFormat.setOutputPath(job, new Path(args[1]));
33         //FileOutputFormat.setOutputPath(job, new Path("/t1-out"));
34         job.setNumReduceTasks(0);
35         job.waitForCompletion(true);
36         
37         
38     }
39     static class HBase2HDFSMapper extends TableMapper<Text, Text>{
40         private Text rowKeyText = new Text();
41         private Text value = new Text();
42         
43         //这个TableMapper中的两个泛型是Map阶段的输出..HBase中的数据要想进入HBase,几乎都用引号引起来.
44         //TableMapper是Mapper类的一个子类.这个类用来定义前面的两个泛型参数.
45         @Override
46         protected void map(
47                 ImmutableBytesWritable key,
48                 Result result,
49                 Mapper<ImmutableBytesWritable, Result, Text, Text>.Context context)
50                 throws IOException, InterruptedException {
51             //结果都在result对象,用raw方法从result对象中找到数据. 这个raw()方法已经过时了.
52             /*
53             KeyValue[] raw = result.raw();
54             for (KeyValue keyValue : raw) {
55                 keyValue.getValue();
56             }
57             */
58             /*
59              * 想输出的数据格式如下: 1 zhangsan 13  (行键,name,age)
60              *                     2 lisi 14
61              */
62             
63             //要想精确的获得某一列的值,要根据行键,列族,列的时间戳.
64             //getColumnLatestCell 是获得最新的时间戳的值 相当于时间戳已经定义好了.
65             byte[] nameBytes = result.getColumnLatestCell("cf".getBytes(), "name".getBytes()).getValue();
66             byte[] ageBytes = result.getColumnLatestCell("cf".getBytes(), "age".getBytes()).getValue();
67             
68             rowKeyText.set(key.get());
69             value.set(new String(nameBytes) + "\t" + new String(ageBytes));
70             context.write(new Text(key.get()), value);
71             //这里已经把数据搞成了 1 name age 的形式....就不需要写Reduce
72         }
73     }
74 }

HDFS---->HBase 通过MR导入到HBase

 1 import java.io.IOException;
 2 
 3 import org.apache.hadoop.conf.Configuration;
 4 import org.apache.hadoop.hbase.HBaseConfiguration;
 5 import org.apache.hadoop.hbase.client.Mutation;
 6 import org.apache.hadoop.hbase.client.Put;
 7 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
 8 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
 9 import org.apache.hadoop.hbase.mapreduce.TableReducer;
10 import org.apache.hadoop.io.LongWritable;
11 import org.apache.hadoop.io.NullWritable;
12 import org.apache.hadoop.io.Text;
13 import org.apache.hadoop.mapreduce.Job;
14 import org.apache.hadoop.mapreduce.Mapper;
15 import org.apache.hadoop.mapreduce.Reducer;
16 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
17 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
18 
19 public class HDFS2HBaseImport {
20 
21     public static void main(String[] args) throws Exception {
22         Configuration conf = HBaseConfiguration.create();
23         conf.set(TableOutputFormat.OUTPUT_TABLE, args[0]);
24         
25         Job job = Job.getInstance(conf, HDFS2HBaseImport.class.getSimpleName());
26         job.setJarByClass(HDFS2HBaseImport.class);
27         
28         //数据到底放到哪一张表中,还是要用到TableMapReduceUtil类.
29         TableMapReduceUtil.addDependencyJars(job);
30         job.setMapperClass(HDFS2HBaseMapper.class);
31         job.setMapOutputKeyClass(Text.class);
32         job.setMapOutputValueClass(Text.class);
33         job.setOutputFormatClass(TextOutputFormat.class);
34         job.setReducerClass(HDFS2HBaseReducer.class);
35         job.setOutputFormatClass(TableOutputFormat.class);
36         FileInputFormat.setInputPaths(job, args[1]);
37         job.waitForCompletion(true);        
38     }
39     
40     static class HDFS2HBaseMapper extends Mapper<LongWritable, Text, Text, Text>{
41         private Text rowKeyText = new Text();
42         private Text value = new Text();
43         
44         @Override
45         protected void map(LongWritable key, Text text,
46                 Mapper<LongWritable, Text, Text, Text>.Context context)
47                 throws IOException, InterruptedException {
48             String[] splits = text.toString().split("\t");
49             rowKeyText.set(splits[0]);
50             value.set(splits[1] + "\t" + splits[2]);//name\tage
51             context.write(rowKeyText, value);
52         }
53     }
54     //Reduce继承的是和在导出的时候Map extends TableMapper 对应的  因为导入的是HBase中,所以后面的参数用NullWritable代替
55     static class HDFS2HBaseReducer extends TableReducer<Text, Text, NullWritable> {
56         @Override
57         protected void reduce(Text k2, Iterable<Text> v2s,
58                 Reducer<Text, Text, NullWritable, Mutation>.Context context)
59                 throws IOException, InterruptedException {
60             //向HBase中插入数据一定要用到Put对象.
61             Put put = new Put(k2.getBytes());
62             
63             for (Text text : v2s) {
64                 String[] splits = text.toString().split("\t");
65                 //加载列和对应的值
66                 put.add("cf".getBytes(), "name".getBytes(), splits[0].getBytes());
67                 put.add("cf".getBytes(), "age".getBytes(), splits[1].getBytes());
68                 context.write(NullWritable.get(), put);//一个参数是key,一个是对应的value.
69                 //导入HBase不需要key...直接用NullWritable对象和封装好数据的put对象.
70             }
71         }
72     }
73 }

本文转自SummerChill博客园博客，原文链接：http://www.cnblogs.com/DreamDrive/p/5583135.html，如需转载请自行联系原作者

微信关注我们

原文链接：https://yq.aliyun.com/articles/376508

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

Elasticsearch之settings和mappings（图文详解）

Elasticsearch之settings和mappings的意义简单的说，就是 settings是修改分片和副本数的。 mappings是修改字段和类型的。记住，可以用url方式来操作它们，也可以用java方式来操作它们。建议用url方式，因为简单很多。 1、ES中的settings 查询索引库的settings信息 [hadoop@HadoopMaster elasticsearch-2.4.3]$curl -XGET http://192.168.80.10:9200/zhouls/_settings?pretty { "zhouls" : { "settings" : { "index" : { "creation_date" : "1488203759467", "uuid" : "Sppm-db_Qm-OHptOC7vznw", "number_of_replicas" : "1", "number_of_shards" : "5", "version" : { "created" : "2040399" } } } } } [hadoop@HadoopM...

2017-11-20

556

它在哪里呢？非常重要！ [hadoop@HadoopMaster custom]$ pwd /home/hadoop/app/elasticsearch-2.4.3/plugins/ik/config/custom [hadoop@HadoopMaster custom]$ ll total 5252 -rw-r--r--. 1 hadoop hadoop 156 Dec 14 10:34ext_stopword.dic -rw-r--r--. 1 hadoop hadoop 130 Dec 14 10:34 mydict.dic -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word.dic -rw-r--r--. 1 hadoop hadoop 63188 Dec 14 10:34 single_word_full.dic -rw-r--r--. 1 hadoop hadoop 10855 Dec 14 10:34 single_word_low_freq.dic -rw-r--r--. 1 hadoop hadoop...

2017-11-20

544

资源下载

更多资源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。