如果你又任何一种关系型数据库的使用经验,那么你将在这里看到非常熟悉的操作。
CREATE TABLE member (name string, age int, sex int);
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.687 seconds
hive>
基于现有的数据建表
hive> create table newtable as select * from oldtable;
hive> SHOW TABLES;
OK
test
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive>
通配符匹配表名称
show tables '*t*';
hive> DROP TABLE test;
OK
Time taken: 1.337 seconds
hive>
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.273 seconds
hive> desc member;
OK
name string
age int
sex int
Time taken: 0.035 seconds, Fetched: 3 row(s)
hive>
增加一个字段 phone 字符串类型
hive> ALTER TABLE member ADD COLUMNS (phone String);
OK
Time taken: 0.188 seconds
hive> desc member;
OK
name string
age int
sex int
phone string
Time taken: 0.033 seconds, Fetched: 4 row(s)
将 test 表重命名为 vipuser
hive> CREATE TABLE test (name string, age int, sex int);
OK
Time taken: 0.311 seconds
hive> ALTER TABLE test RENAME TO vipuser;
OK
Time taken: 0.115 seconds
hive> desc vipuser;
OK
name string
age int
sex int
Time taken: 0.032 seconds, Fetched: 3 row(s)
hive>
仅仅创建表结构,不会复制数据过来。
hive> CREATE TABLE news_2017 LIKE news;
OK
Time taken: 0.311 seconds
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
hive> SHOW PARTITIONS passwd;
OK
computer=hadoop
computer=hbase
computer=hive
Time taken: 0.056 seconds, Fetched: 3 row(s)
hive> alter table member add partition (province='shenzhen');
hive> CREATE TABLE passwd (a string, b string, c string, d string, e string, f string) PARTITIONED BY (computer string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':';
OK
Time taken: 0.323 seconds
hive> load data local inpath '/etc/passwd' overwrite into table passwd partition(computer="hive");
Loading data to table default.passwd partition (computer=hive)
OK
Time taken: 0.499 seconds
hive> select * from passwd;
OK
root x 0 0 root /root hive
bin x 1 1 bin /bin hive
daemon x 2 2 daemon /sbin hive
adm x 3 4 adm /var/adm hive
lp x 4 7 lp /var/spool/lpd hive
sync x 5 0 sync /sbin hive
shutdown x 6 0 shutdown /sbin hive
halt x 7 0 halt /sbin hive
mail x 8 12 mail /var/spool/mail hive
operator x 11 0 operator /root hive
games x 12 100 games /usr/games hive
ftp x 14 50 FTP User /var/ftp hive
nobody x 99 99 Nobody / hive
dbus x 81 81 System message bus / hive
polkitd x 999 998 User for polkitd / hive
avahi x 70 70 Avahi mDNS/DNS-SD Stack /var/run/avahi-daemon hive
avahi-autoipd x 170 170 Avahi IPv4LL Stack /var/lib/avahi-autoipd hive
postfix x 89 89 /var/spool/postfix hive
sshd x 74 74 Privilege-separated SSH /var/empty/sshd hive
ntp x 38 38 /etc/ntp hive
rpc x 32 32 Rpcbind Daemon /var/lib/rpcbind hive
qemu x 107 107 qemu user / hive
unbound x 998 996 Unbound DNS resolver /etc/unbound hive
rpcuser x 29 29 RPC Service User /var/lib/nfs hive
nfsnobody x 65534 65534 Anonymous NFS User /var/lib/nfs hive
saslauth x 997 76 "Saslauthd user" /run/saslauthd hive
radvd x 75 75 radvd user / hive
nagios x 1000 1000 /home/nagios hive
apache x 48 48 Apache /usr/share/httpd hive
exim x 93 93 /var/spool/exim hive
tss x 59 59 Account used by the trousers package to sandbox the tcsd daemon /dev/null hive
git x 996 994 /var/opt/gitlab hive
gitlab-www x 995 993 /var/opt/gitlab/nginx hive
gitlab-redis x 994 992 /var/opt/gitlab/redis hive
gitlab-psql x 993 991 /var/opt/gitlab/postgresql hive
nginx x 992 990 nginx user /var/cache/nginx hive
www x 80 80 Web Application /www hive
mysql x 27 27 MySQL Server /var/lib/mysql hive
redis x 991 989 Redis Database Server /var/lib/redis hive
epmd x 990 988 Erlang Port Mapper Daemon /tmp hive
rabbitmq x 989 987 RabbitMQ messaging server /var/lib/rabbitmq hive
solr x 1001 1001 Apache Solr /srv/solr hive
mongodb x 184 986 MongoDB Database Server /var/lib/mongodb hive
test x 1002 1002 /home/test hive
sysaccount x 988 985 /home/sysaccount hive
systemd-bus-proxy x 987 983 systemd Bus Proxy / hive
systemd-network x 986 982 systemd Network Management / hive
elasticsearch x 985 980 elasticsearch user /home/elasticsearch hive
zabbix x 984 979 Zabbix Monitoring System /var/lib/zabbix hive
mysqlrouter x 983 978 MySQL Router /var/lib/mysqlrouter hive
hadoop x 1003 1003 /home/hadoop hive
Time taken: 0.118 seconds, Fetched: 51 row(s)
hive> SHOW PARTITIONS passwd;
OK
computer=hive
Time taken: 0.058 seconds, Fetched: 1 row(s)
hive> CREATE VIEW v_test AS SELECT name,age FROM member where age>20;
hive> select * from v_test;
hive> drop view test;
OK
Time taken: 0.276 seconds
判断视图是否存在
hive> DROP VIEW IF EXISTS v_test;
OK
Time taken: 0.495 seconds
首先创建一个文本文件,如下:
[root@localhost ~]# cat /tmp/hive.txt
1 2 3
2 3 4
3 4 5
6 7 8
hive> CREATE TABLE test (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.294 seconds
hive> LOAD DATA LOCAL INPATH '/tmp/hive.txt' OVERWRITE INTO TABLE test;
Loading data to table default.test
OK
Time taken: 0.541 seconds
hive> select * from test;
OK
1 2 3
2 3 4
3 4 5
6 7 8
Time taken: 0.952 seconds, Fetched: 5 row(s)
hive> CREATE TABLE mytable AS SELECT * FROM anytable;
17.3.4.3. 从其他表查询数据然后插入指定表中
INSERT OVERWRITE TABLE mytable SELECT * FROM other ;
17.3.4.4. 从现有表中查询数据然后插入到新的分区表中
hive> insert into table table2 partition(created_date) select * from table1;
hive> insert into table newtable partition(type='1') select * from oldtable;
[hadoop@localhost ~]$ hdfs dfs -ls /user/hive/warehouse
Found 3 items
drwxrwxr-x - hadoop supergroup 0 2017-06-29 03:36 /user/hive/warehouse/member
drwxrwxr-x - hadoop supergroup 0 2017-06-29 03:32 /user/hive/warehouse/test
drwxrwxr-x - hadoop supergroup 0 2017-06-29 03:41 /user/hive/warehouse/vipuser
[hadoop@localhost ~]$ hdfs dfs -cp /user/hive/warehouse/vipuser /user/hive/warehouse/vipuser2
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM test;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM member;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20170629040540_ddeda146-efed-44c4-bb20-a6453c21cc8e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498716998098_0001, Tracking URL = http://localhost:8088/proxy/application_1498716998098_0001/
Kill Command = /srv/apache-hadoop/bin/hadoop job -kill job_1498716998098_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-06-29 04:05:49,221 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1498716998098_0001
Moving data to local directory /tmp/test
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 10.54 seconds
hive> insert overwrite directory '/usr/tmp/test' select * from test;
原文出处:Netkiller 系列 手札
本文作者:陈景峯
转载请与作者联系,同时请务必标明文章原始出处和作者信息及本声明。