您现在的位置是:首页 > 文章详情

Hive 简单SQL

日期:2019-10-19点击:416

1.创建表

(1)内部表和外部表的区别

默认创建的是内部表,可以指定目录,如果不指定则会创建默认目录,一旦drop,该目录和数据都会被删除

创建external table 的时候需要指定存放目录,并且drop表的时候,不会删除该目录和目录下的数据,只会删除元信息


#创建一个外部表

0: jdbc:hive2://192.168.163.102:10000> create external table t10(c1 int,c2 string) row format delimited fields terminated by ',' stored as testfile  location "/dir1";


[root@Darren2 tmp]# hdfs dfs -put file1 /dir1

[root@Darren2 tmp]# hdfs dfs -ls -R /dir1

-rw-r--r--   1 root supergroup         24 2017-11-25 20:53 /dir1/file1


0: jdbc:hive2://192.168.163.102:10000> drop table t10;

No rows affected (0.41 seconds)


[root@Darren2 tmp]# hdfs dfs -ls -R /dir1

17/11/25 20:56:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

-rw-r--r--   1 root supergroup         24 2017-11-25 20:53 /dir1/file1


#创建一个默认的内部表

0: jdbc:hive2://192.168.163.102:10000> create table t2(c1 int,c2 string) row format delimited fields terminated by ',' stored as textfile;


(2)Hive支持的存储文件格式

textfile, sequencefile, orc, parquet,avro

0: jdbc:hive2://192.168.163.102:10000> create table t5(c1 int,c2 string) row format delimited fields terminated by ',' stored as sequencefile ;

0: jdbc:hive2://192.168.163.102:10000> insert into t5 select * from t4;


#作为sequencefile格式存储的文件无法直接查看其内容

[root@Darren2 tmp]# hdfs dfs -ls  /user/hive/warehouse/testdb1.db/t5/

-rwxr-xr-x   1 root supergroup        146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0

0: jdbc:hive2://192.168.163.102:10000> desc formatted t5;


2.导入数据到hive

语法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]


(1) 直接把本地的文件导入到hive中的表

0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file1' into table t1;

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+--------+--+

| t1.c1  | t1.c2  |

+--------+--------+--+

| 1      | aaa    |

| 2      | bbb    |

| 3      | ccc    |

| 4      | ddd    |

+--------+--------+--+


(2)加载数据到表中,但是会覆盖表中所有数据,实质是覆盖t1目录下的所有文件

0: jdbc:hive2://192.168.163.102:10000> load data local inpath '/tmp/file3' overwrite into table t1;

No rows affected (0.597 seconds)

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+---------+--+

| t1.c1  |  t1.c2  |

+--------+---------+--+

| 1      | yiyi    |

| 2      | erer    |

| 3      | sansan  |

| 4      | sisi    |

+--------+---------+--+

4 rows selected (0.073 seconds)


(3)把hdfs上的文件导入到hive中的表

[root@Darren2 tmp]# cat /tmp/file2

5,eee


[root@Darren2 tmp]# hdfs dfs -put /tmp/file2 /user/hive/warehouse/testdb1.db/t1

0: jdbc:hive2://192.168.163.102:10000> load data inpath '/user/hive/warehouse/testdb1.db/t1/file2' into table t1;

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+--------+--+

| t1.c1  | t1.c2  |

+--------+--------+--+

| 1      | aaa    |

| 2      | bbb    |

| 3      | ccc    |

| 4      | ddd    |

| 5      | eee    |

+--------+--------+--+


(4)根据一个表创建另一个表,同时插入数据

0: jdbc:hive2://192.168.163.102:10000> create table t2 as select * from t1;


(5)根据一个表先创建表结构,后插入数据

0: jdbc:hive2://192.168.163.102:10000> create table t3 like t1;

0: jdbc:hive2://192.168.163.102:10000> insert into t3  select * from t1;


3,从查询结果导数据到文件系统中

(1)从查询结果导数据到HDFS文件系统中

0: jdbc:hive2://192.168.163.102:10000> select * from t1;

+--------+---------+--+

| t1.c1  |  t1.c2  |

+--------+---------+--+

| 1      | yiyi    |

| 2      | erer    |

| 3      | sansan  |

| 4      | sisi    |

+--------+---------+--+


0: jdbc:hive2://192.168.163.102:10000> insert overwrite directory '/user/hive/warehouse/tmp' select * from testdb1.t1;

[root@Darren2 tmp]# hdfs dfs -ls -R /user/hive/warehouse/tmp

-rwxr-xr-x   1 root supergroup         30 2017-11-26 00:25 /user/hive/warehouse/tmp/000000_0

[root@Darren2 tmp]# hdfs dfs -get /user/hive/warehouse/tmp/000000_0 /tmp/


导出的文件的分隔符对应的ASCII码是Ctrl+a 即\001

[root@Darren2 tmp]# vim /tmp/000000_0

1^Ayiyi

2^Aerer

3^Asansan

4^Asisi


利用这个文件创建一个外部表,使用\001为分隔符

0: jdbc:hive2://192.168.163.102:10000> create external table t5(c1 int,c2 string) row format delimited fields terminated by '\001' location '/user/hive/warehouse/tmp/';

0: jdbc:hive2://192.168.163.102:10000> select * from t5;

+--------+---------+--+

| t5.c1  |  t5.c2  |

+--------+---------+--+

| 1      | yiyi    |

| 2      | erer    |

| 3      | sansan  |

| 4      | sisi    |

+--------+---------+--+


(2)从查询结果导数据到本地

0: jdbc:hive2://192.168.163.102:10000> insert overwrite  local directory '/tmp' select * from testdb1.t1;

[root@Darren2 tmp]#  ls /tmp/000000_0

/tmp/000000_0


4 insert   

(1) insert 插入数据的实质是建立一个文件

0: jdbc:hive2://192.168.163.102:10000> insert into t5 values(4,'sisi');

No rows affected (17.987 seconds)

0: jdbc:hive2://192.168.163.102:10000> dfs -ls /user/hive/warehouse/testdb1.db/t5 ;

+----------------------------------------------------------------------------------------------------------------+--+

|                                                   DFS Output                                                   |

+----------------------------------------------------------------------------------------------------------------+--+

| Found 2 items                                                                                                  |

| -rwxr-xr-x   1 root supergroup        146 2017-11-26 03:03 /user/hive/warehouse/testdb1.db/t5/000000_0         |

| -rwxr-xr-x   1 root supergroup        106 2017-11-26 04:22 /user/hive/warehouse/testdb1.db/t5/000000_0_copy_1  |

+----------------------------------------------------------------------------------------------------------------+--+


原文链接:https://yq.aliyun.com/articles/647753
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章