CDH5: 使用parcels配置lzo
一、Parcel 部署步骤
1 下载: 首先需要下载 Parcel。下载完成后,Parcel 将驻留在 Cloudera Manager 主机的本地目录中。2 分配: Parcel 下载后,将分配到群集中的所有主机上并解压缩。
3 激活: 分配后,激活 Parcel 为群集重启后使用做准备。激活前可能还需要升级。
二、lzo parcels本地化
1、到http://archive-primary.cloudera.com/gplextras/parcels/latest/下载最新lzo parcels包,根据安装hadoop集群的服务器操作系统版本下载,我使用的是rhel6.2, 所以下载的是HADOOP_LZO-0.4.15-1.gplextras.p0.64-el6.parcel
2、同时下载manifest.json,并根据manifest.json文件中的hash值创建sha文件(注意:sha文件的名称与parcels包名一样)
3、命令行进入Apache(如果没有安装,则需要安装)的网站根目录下,默认是/var/www/html,在此目录下创建lzo,并将这三个文件放在lzo目录中
4、启动httpd服务,在浏览器查看,如http://ip/lzo,则结果如下:
5、将发布的local parcels发布地址配置到远程 Parcel 存储库 URL地址中,见下图
6、在cloud manager的parcel页面的可下载parcel中,就可以看到lzo parcels, 点击并进行下载
7、根据parcels的部署步骤,进行分配、激活。结果如下图
三、修改配置
修改hdfs的配置
将io.compression.codecs属性值中追加,org.apache.hadoop.io.compress.Lz4Codec,
com.hadoop.compression.lzo.LzopCodec
修改yarn配置
将mapreduce.application.classpath的属性值修改为:$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
将mapreduce.admin.user.env的属性值修改为:LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
四、验证
create external table lzo(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
创建一个data.txt,内容如下:
1#tianhe 2#gz 3#sz 4#sz 5#bx
然后使用lzop命令对此文件压缩,然后上传到hdfs的/test目录下 启动hive,建表并进行数据查询,结果如下:
hive> create external table lzo(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
OK
Time taken: 0.108 seconds
hive> select * from lzo where id>2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1404206497656_0002, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0002/
Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job -kill job_1404206497656_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-01 17:30:27,547 Stage-1 map = 0%, reduce = 0%
2014-07-01 17:30:37,403 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
2014-07-01 17:30:38,469 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
2014-07-01 17:30:39,527 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
MapReduce Total cumulative CPU time: 2 seconds 840 msec
Ended Job = job_1404206497656_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.84 sec HDFS Read: 295 HDFS Write: 15 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 840 msec
OK
3 sz
4 sz
5 bx
Time taken: 32.803 seconds, Fetched: 3 row(s)
hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
hive> create external table lzo2(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
OK
Time taken: 0.092 seconds
hive> insert into table lzo2 select * from lzo;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1404206497656_0003, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0003/
Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job -kill job_1404206497656_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-01 17:33:47,351 Stage-1 map = 0%, reduce = 0%
2014-07-01 17:33:57,114 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec
2014-07-01 17:33:58,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec
MapReduce Total cumulative CPU time: 1 seconds 960 msec
Ended Job = job_1404206497656_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01.kt:8020/tmp/hive-hdfs/hive_2014-07-01_17-33-22_504_966970548620625440-1/-ext-10000
Loading data to table default.lzo2
Table default.lzo2 stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 171, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 295 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 960 msec
OK
Time taken: 36.625 seconds

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
HTTPFS: 基于HTTP操作hadoop hdfs文件系统
一、HTTPFS简介 1:httpfs是cloudera公司提供的一个hadoop hdfs的一个http接口,通过WebHDFS REST API 可以对hdfs进行读写等访问 2:与WebHDFS的区别是不需要客户端可以访问hadoop集群的每一个节点,通过httpfs可以访问放置在防火墙后面的hadoop集群 3:httpfs是一个Web应用,部署在内嵌的tomcat中 用这种方式在数据共享给其他系统时,网络安全上更容易实现,使用请参考: http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/WebHDFS.html 二、启动服务(针对使用cloudera manager安装了CDH4或CDH5) 在cm控制台,打开hdfs实例页面,添加httpfs服务,然后启动即可。 三、安装可参考(手工安装) http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Install...
- 下一篇
hbase:用于实现版本文件及配置同步的两个shell脚本
复制master节点上的版本内容到所有slaves节点上 注意: 1:版本目录做了软链接,如 ln -s hbase-0.94.6 hbase 2: 要根据实际情况,将/usr/local/修改为hbase所在的目录 #!/bin/bash # copy a new release of HBase from the masternode to all slave nodes # Rsyncs HBase files across all slaves. Must run on master. # Assumes all files are located in /usr/local if [ "$#" != "2" ]; then echo "usage: $(basename $0) <dir-name> <ln-name>" echo " example: $(basename $0) hbase-0.1 hbase" exit 1 fi SRC_PATH="/usr/local/$1/conf/regionservers" for srv in ...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- CentOS7设置SWAP分区,小内存服务器的救世主
- CentOS7,8上快速安装Gitea,搭建Git服务器
- CentOS7编译安装Gcc9.2.0,解决mysql等软件编译问题
- Springboot2将连接池hikari替换为druid,体验最强大的数据库连接池
- CentOS8安装MyCat,轻松搞定数据库的读写分离、垂直分库、水平分库
- Windows10,CentOS7,CentOS8安装Nodejs环境
- CentOS7,CentOS8安装Elasticsearch6.8.6
- CentOS8安装Docker,最新的服务器搭配容器使用
- 设置Eclipse缩进为4个空格,增强代码规范
- Docker快速安装Oracle11G,搭建oracle11g学习环境