Sqoop数据导入/导出
1. 从HDFS导出到RDBMS数据库
1.1 准备工作
写一个文件
sqoop_export.txt
1201,laojiao, manager,50000, TP
1202,fantj,preader,50000,TP
1203,jiao,dev,30000,AC
1204,laowang,dev,30000,AC
1205,laodu,admin,20000,TP
1206,laop,grp des,20000,GR
上传到hdfs:hadoop fs -put sqoop_export.txt /sqoop/export/
创建mysql数据库并增加授权:
create database sqoopdb;
grant all privileges on sqoopdb.* to 'sqoop'@'%' identified by 'sqoop';
grant all privileges on sqoopdb.* to 'sqoop'@'localhost' identified by 'sqoop';
grant all privileges on sqoopdb.* to 'sqoop'@'s166' identified by 'sqoop';
flush privileges;
创建表:
use sqoopdb;
CREATE TABLE employee (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20),
deg VARCHAR(20),
salary INT,
dept VARCHAR(10));
1.2 执行导出命令
bin/sqoop export \
--connect jdbc:mysql://s166:3306/sqoopdb \
--username sqoop \
--password sqoop \
--table employee \
--export-dir /sqoop/export/emp/ \
--input-fields-terminated-by ','
我执行的时候发现它总在报这个错:
ERROR tool.ExportTool: Encountered IOException running export job: java.io.FileNotFoundException: File does not exist: hdfs://s166/home/fantj/sqoop/lib/avro-mapred-1.5.3.jar
然后找了很多解决方案:
- 替换mysql-java的jar包,换个高版本的。
- 修改hadoop的
mapred-site.xml
文件(先更名mv mapred-site.xml.template mapred-site.xml
)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
解决后再执行:
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=107
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=95
CPU time spent (ms)=1210
Physical memory (bytes) snapshot=97288192
Virtual memory (bytes) snapshot=2075623424
Total committed heap usage (bytes)=17006592
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
22:34:37 INFO mapreduce.ExportJobBase: Transferred 274 bytes in 47.346 seconds (5.7872 bytes/sec)
22:34:37 INFO mapreduce.ExportJobBase: Exported 6 records.
说明处理成功!
1.3 验证mysql表
mysql> select * from employee;
+------+---------+----------+--------+------+
| id | name | deg | salary | dept |
+------+---------+----------+--------+------+
| 1201 | laojiao | manager | 50000 | TP |
| 1202 | fantj | preader | 50000 | TP |
| 1203 | jiao | dev | 30000 | AC |
| 1204 | laowang | dev | 30000 | AC |
| 1205 | laodu | admin | 20000 | TP |
| 1206 | laop | grp des | 20000 | GR |
+------+---------+----------+--------+------+
6 rows in set (0.07 sec)
2. 导入表表数据到HDFS
bin/sqoop import \
--connect jdbc:mysql://s166:3306/sqoopdb \
--username sqoop \
--password sqoop \
--table employee --m 1
22:44:26 INFO mapreduce.Job: The url to track the job: http://s166:8088/proxy/application_1532679575794_0002/
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=123111
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=161
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=5972
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5972
Total vcore-seconds taken by all map tasks=5972
Total megabyte-seconds taken by all map tasks=6115328
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=195
CPU time spent (ms)=970
Physical memory (bytes) snapshot=99921920
Virtual memory (bytes) snapshot=2079825920
Total committed heap usage (bytes)=18358272
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=161
22:44:57 INFO mapreduce.ImportJobBase: Transferred 161 bytes in 34.5879 seconds (4.6548 bytes/sec)
22:44:57 INFO mapreduce.ImportJobBase: Retrieved 6 records.
3. 导入关系表到HIVE
sqoop import --connect jdbc:mysql://s166:3306/sqoopdb --username sqoop --password sqoop --table employee --hive-import --m 1
4. 导入到HDFS指定目录
sqoop import \
--connect jdbc:mysql://s166:3306/sqoopdb \
--username sqoop \
--password sqoop \
--target-dir /queryresult \
--table employee --m 1
5. 导入表数据子集
我们可以导入表的使用Sqoop导入工具,"where"子句的一个子集。并将结果存储在HDFS的目标目录。
sqoop import \
--connect jdbc:mysql://s166:3306/sqoopdb \
--username sqoop \
--password sqoop \
--where "salary>10000" \
--target-dir /wherequery \
--table employee --m 1

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
-
上一篇
Hadoop单词统计
1. 本地创建文本文件 [root@s166 fantj]# mkdir input [root@s166 fantj]# cd input/ [root@s166 input]# echo "hello fantj" > file1.txt [root@s166 input]# echo "hello hadoop" > file2.txt [root@s166 input]# echo "hello mapreduce" > file3.txt [root@s166 input]# ls file1.txt file2.txt file3.txt 2. 将文件上传到hadoop [root@s166 input]# hadoop fs -mkdir /wordcount [root@s166 input]# hadoop fs -ls / Found 1 items drwxr-xr-x - root supergroup 0 2018-07-27 07:51 /wordcount [root@s166 input]# cd .. [root@s166 fantj...
-
下一篇
专家教你使用MaxCompute玩转大数据分析!
摘要:传统的数据分析,经常使用的工具是Hadoop或Spark,在使用之前环境是需要用户自己去搭建的。随着业务逐渐向云迁移,如何在云上进行大数据分析是需要解决的问题。为此,阿里云提供了一项很重要的服务——大数据计算服务MaxCompute。本视频中,上海驻云科技的专家将带领大家了解如何使用阿里云的大数据计算服务MaxCompute来进行数据分析的。 演讲嘉宾简介: 翟永东,来自上海驻云科技。上海驻云科技是阿里云的合作伙伴之一,其宗旨是帮助客户将自己的业务系统迁移到云上,公司在云上会使用阿里云的一些大数据产品帮助用户做数据分析。 以下内容根据演讲嘉宾视频分享以及PPT整理而成。 本次演讲PPT下载,请戳这里! 关于MaxCompute更多精彩文章,请移步云栖社区MaxCompute公众号! 本次的分享主要分为三部分: 一、企业云上搭建的数据分析平
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- SpringBoot2整合Thymeleaf,官方推荐html解决方案
- MySQL数据库在高并发下的优化方案
- CentOS7编译安装Cmake3.16.3,解决mysql等软件编译问题
- Dcoker安装(在线仓库),最新的服务器搭配容器使用
- MySQL8.0.19开启GTID主从同步CentOS8
- SpringBoot2配置默认Tomcat设置,开启更多高级功能
- Springboot2将连接池hikari替换为druid,体验最强大的数据库连接池
- CentOS8编译安装MySQL8.0.19
- SpringBoot2初体验,简单认识spring boot2并且搭建基础工程
- SpringBoot2编写第一个Controller,响应你的http请求并返回结果