hive2.1.0初探以及其中踩的坑

2016-11-01 740

打开微信扫一扫，关注微信公众号【数据与算法联盟】
转载请注明出处： http://blog.csdn.net/gamer_gyt
博主微博： http://weibo.com/234654758
Github： https://github.com/thinkgamer

前言

        hive 2.x版本出来已经有一段时间了，目前的2.x中的稳定版本为2.1.0
        github地址：https://github.com/apache/hive/tree/master
        官方下载地址为：https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
        工作之余，我们就来看看hive2.1.0这个版本相对于1.2来说的change

部署

之前的一篇hive1.2的mysql部署文章：
http://blog.csdn.net/gamer_gyt/article/details/52032579

hive2.1.0相对1.2来讲部署上并没有什么变化，但就算是温习吧，我们依旧走一遍这个过程，看看有哪些坑等着我们去踩。

1：下载文件

2：解压至指定目录

我这里hive的存放目录是/opt/bigdata/hive，并重命名为hive

tar -zxvf /home/thinkgamer/下载/apache-hive-2.1.0-bin.tar.gz -C /opt/bigdata/
mv apache-hive-2.1.0-bin/ hive

3：Mysql创建hive21用户

创建hive21用户，赋予权限，清除缓存

CREATE USER 'hive21' IDENTIFIED BY 'hive21';
grant all privileges on *.* to 'hive21' with grant option;
flush privileges;

4：拷贝msyql jar包到hive/lib

cp /path/to/mysql-connector-java-5.1.38-bin.jar hive/lib

5：修改配置文件

(1)：javax.jdo.option.ConnectionURL

<name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive21?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>

(2)：javax.jdo.option.ConnectionDriverName

<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>

(3)：javax.jdo.option.ConnectionUserName

<name>javax.jdo.option.ConnectionUserName</name>  
<value>hive21</value>

(4)：javax.jdo.option.ConnectionPassword

<name>javax.jdo.option.ConnectionPassword</name>
<value>hive21</value>

6：启动/测试

bin/hive

hive> show databases;
OK
default
Time taken: 1.123 seconds, Fetched: 1 row(s)
hive> create table table_name (  
    >   id                int,  
    >   dtDontQuery       string,  
    >   name              string  
    > );  
OK
Time taken: 0.983 seconds
hive> show tables;
OK
table_name
Time taken: 0.094 seconds, Fetched: 1 row(s)

这个时候进入mysql数据库有一个hive21的数据库

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| hive21             |
| mysql              |
| performance_schema |
+--------------------+

7：踩过的坑

(1)：没有初始化hive元数据库

报错如下：

Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema.  
If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))

解决办法：

bin/schematool -initSchema -dbType mysql --verbose

此问题解决时在网上查阅资料有人说这里要初始化derby数据库，个人认为这是不正确的，因为我们已经配置使用了mysql作为元数据库

(2)：未配置日志和缓存目录

报错如下：

Logging initialized using configuration in file:/opt/bigdata/hive/conf/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:205)
    at org.apache.hadoop.fs.Path.<init>(Path.java:171)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:202)
    ... 12 more

解决办法：

修改 hive-site.xml 替换${system:java.io.tmpdir} 和 ${system:user.name}为/opt/bigdata/hive/tmp

hive2.1 和hive1.2的简单比较

hive1.2

[master@master1 hive]$ bin/hive --service help  
Usage ./hive <parameters> --service serviceName <service parameters>  
Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version   
Parameters parsed:  
  --auxpath : Auxillary jars   
  --config : Hive configuration directory  
  --service : Starts specific service/component. cli is default  
Parameters used:  
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory  
  HIVE_OPT : Hive options  
For help on a particular service:  
  ./hive --service serviceName --help  
Debug help:  ./hive --debug --help

hive2.1

root@thinkgamer-pc:/opt/bigdata/hive# bin/hive --service help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cleardanglingscratchdir cli hbaseimport hbaseschematool help hiveburninclient hiveserver2 hplsql hwi jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool version 
Parameters parsed:
  --auxpath : Auxillary jars 
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
For help on a particular service:
  ./hive --service serviceName --help
Debug help:  ./hive --debug --help

我们可以看到在hive2.1中增加了对hbase的支持，同时还增加了hplsql等等，这些都是hive2.1的新特性，这里介绍几个常用的

beeline：和hive1.2中beeline使用方法应该是一样的，至于性能方面的提升肯定是有的，beeline的使用，参考
http://blog.csdn.net/gamer_gyt/article/details/52062460
cleardanglingscratchdir：scratch directory（清楚缓存）
使用方法： bin/hive –service cleardanglingscratchdir
hbaseimport/hbaseschematool：与Hbase进行交互
hiveserver2：提供一个JDBC接口，供外部程序操作hive
hplsql：一个工具，实现sql在Apache hive，sparkSql，以及其他基于hadoop的sql，Nosql和关系数据库的使用
官方解释：

HPL/SQL (previously known as PL/HQL) is an open source tool (Apache License 2.0) that implements procedural SQL language for Apache Hive, SparkSQL as well as any other SQL-on-Hadoop implementations, NoSQL and RDBMS.

HPL/SQL language is compatible to a large extent with Oracle PL/SQL, ANSI/ISO SQL/PSM (IBM DB2, MySQL, Teradata i.e), PostgreSQL PL/pgSQL (Netezza), Transact-SQL (Microsoft SQL Server and Sybase) that allows you leveraging existing SQL/DWH skills and familiar approach to implement data warehouse solutions on Hadoop. It also facilitates migration of existing business logic to Hadoop.

HPL/SQL is an efficient way to implement ETL processes in Hadoop.

LLAP：也是hive2.1引入的新特性，大概就是提升hive2.1的执行时间，具体可参考：
http://zh.hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/

下面附一张从网上看到的图片：

微信关注我们

原文链接：https://yq.aliyun.com/articles/413027

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

Hadoop启动报Error: JAVA_HOME is not set and could not be found解决办法

Hadoop安装完后，启动时报Error: JAVA_HOME is not set and could not be found. 解决办法：修改/etc/hadoop/hadoop-env.sh中设JAVA_HOME。应当使用绝对路径。 export JAVA_HOME=$JAVA_HOME //错误，原来就这样的不该就报错 export JAVA_HOME=/usr/java/jdk1.6.0_45 //正确，应该这么改

2016-10-31

924

MaxCompute 2.0 上线以来很多同学都在询问如何才能获取试用资格。在这里向大家简要介绍MaxCompute 2.0发布的功能，申请方式及如何使用。大数据计算服务(MaxCompute) 快速、完全托管的TB/PB级数据仓库解决方案，向用户提供了完善的数据导入方案以及多种经典的分布式计算模型，能够更快速的解决用户海量数据计算问题，有效降低企业成本，并保障数据安全。了解更多 MaxCompute 2.0发布的功能包括：更快的SQL执行引擎：降低企业大数据分析成本。SQL执行效率更高。非结构化数据处理能力：用户可以通过MaxCompute的SQL直接访问OSS上的数据，或访问TableStore(OTS) 数据，从而达到分析视频、音频、图像、基因、气象等特殊格式数据的目的。生态兼容：向开源Hadoop MapReduce使用场景高

2016-11-01

699

资源下载

更多资源

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长，特征是大鼻子、头戴帽子、身穿背带裤，还留着胡子。与他的双胞胎兄弟路易基一起，长年担任任天堂的招牌角色。

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。