Hadoop集群(一) Zookeeper搭建
作为Hadoop初学者,自然要从安装入手。而hadoop的优势就是分布式,所以,也一定要安装分布式的系统。 整体安装步骤,包括Zookeeper+HDFS+Hbase,为了文章简洁,我会分三篇blog记录我的安装步骤。 本文记录的是集群环境准备,zookeeper的安装步骤。 ~~~~~~~~~~~环境准备部分~~~~~~~~~~~~~~~~~ IP部署规划(准备了三个虚拟机,操作系统是Centos6.7) Zookeeper主机分配 1 2 3 192.168.67.101c6701 192.168.67.102c6702 192.168.67.103c6703 HDFS主机分配 1 2 3 192.168.67.101c6701 --Namenode+datanode 192.168.67.102c6702 --datanode 192.168.67.103c6703 --datanode Hbase主机分配 1 2 3 192.168.67.101c6701 --Master+region 192.168.67.102c6702 --region 192.168.67.103c6703 --region 1. 为了分别管理各个服务,我为每个服务,都单独创建了用户. 创建的用户有: 1 2 3 zk hdfs hbase 2. 为每一个用户创建ssh免密,并分发到全部节点,这样,我就只在C6701上执行命令,通过远程的方式安装C6702和C6703 1 ssh-keygen-trsa-f~/.ssh/id_rsa 拷贝密钥到全部节点即可(只有c6701访问02和03不需要密码,反向访问是需要密码的) 1 2 3 ssh-copy-idc6701 ssh-copy-idc6702 ssh-copy-idc6703 3. 强调一下,/etc/hosts 需要设置 1 2 3 4 [root@c6701~]#more/etc/hosts 192.168.67.101c6701.orgc6701 192.168.67.102c6702.orgc6702 192.168.67.103c6703.orgc6703 4. 下面在C6701执行,中间会调用ssh远程执行一些命令,验证ssh免密是否生效 1 2 sshc6702 "cat/proc/cpuinfo" sshc6702 "hostname" 5. 下载安装软件(内网地址) 1 2 3 4 5 6 cd/tmp/software wgethttp://192.21.104.48/deploy/jdk-8u144-linux-x64.tar.gz wgethttp://192.21.104.48/deploy/zookeeper-3.4.6.tar.gz wgethttp://192.21.104.48/deploy/hbase-1.1.3.tar.gz wgethttp://192.21.104.48/deploy/hadoop-2.6.0-EDH-0u2.tar.gz wgethttp://192.21.104.48/deploy/hadoop-2.7.1.tar.gz 6. 安装JDK,全部节点都要安装 1 #tar-xzvfjdk-8u144-linux-x64.tar.gz-C/usr/ local 7. 添加下面信息到.bash_profile 1 2 3 4 5 exportJAVA_HOME=/usr/ local /jdk1.8.0_144 exportJRE_HOME=/usr/ local /jdk1.8.0_144/jre exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH exportPATH=$JAVA_HOME/bin:$PATH source/etc/profile 8. 检查版本信息 1 2 3 4 #java-version javaversion "1.8.0_144" Java(TM)SERuntimeEnvironment(build1.8.0_144-b01) JavaHotSpot(TM)64- Bit ServerVM(build25.144-b01,mixedmode) ~~~~~~~~~~~zookeeper的安装部分~~~~~~~~~~~~~~~~~ 1. 在C6701安装Zookeeper 1 2 3 4 5 useraddzk echo "zk:zk" |chpasswd su-zk mkdirzk tar-zxvf/tmp/software/zookeeper-3.4.6.tar.gz-C/home/zk/zk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ~~~~~~~~~~~~~zoo.cfg的配置~~~~~~~~~~~~~~~ $pwd /home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg $cat/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg|grep-v '^#' tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/log clientPort=2181 autopurge.snapRetainCount=3 autopurge.purgeInterval=6 server.1=c6701:2888:3888 server.2=c6702:2888:3888 server.3=c6703:2888:3888 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. 根据zoo.cfg创建data和log两个文件夹 1 2 3 4 5 #mkdir-p/data/zookeeper/data #mkdir-p/data/zookeeper/log #chown-Rzk:zk/data/zookeeper #chown-Rzk:zk/data/zookeeper/data #chown-Rzk:zk/data/zookeeper/log 3.在zookeeper的目录中,创建上述两个文件夹。进入zkdata文件夹,创建文件myid,填入1。这里写入的1,是在zoo.cfg文本中的server.1中的1。当我们把所有文件都配置完毕,我们把hadoop1中yarn目录复制到其它机器中,我们在修改每台机器中对应的myid文本,hadoop2中的myid写入2。其余节点,安照上面配置,依此写入相应的数字。Zkdatalog文件夹,是为了指定zookeeper产生日志指定相应的路径。 1 #su-zk-c "echo1>/data/zookeeper/data/myid" 4. 添加环境变量,在/etc/profile目录中,将ZOOKEEPER_HOME/bin在原有的PATH后面加入":$ZOOKEEPER_HOME/bin" 关于环境变量修改/etc目录下的profile文件,也可以在根目录下的.bashrc目录下添加环境变量。这两者有什么区别:.bashrc是对当前目录用户的环境变量,profile文件是对所有用户都开放的目录。当系统加载文件中,先从profile找相应的路劲,如果没有会在.bashrc文件中找对应的环境变量路径。这两者大家稍至了解。 然后 source /etc/profile 5. 安装c6702的zookeeper 1 2 #sshc6702 "useraddzk" #sshc6702 "echo" zk:zk "|chpasswd" 为zk用户ssh免密 1 #ssh-copy-idzk@c6702 拷贝软件 1 2 #scp-r/tmp/software/hadoop-*root@c6702:/tmp/software #sshc6702 "chmod777/tmp/software/*" 创建目录,解压软件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #sshzk@c6702 "mkdirzk" #sshzk@c6702 "tar-zxvf/tmp/software/zookeeper-3.4.6.tar.gz-C/home/zk/zk" #sshzk@c6702 "ls-alzk" #sshzk@c6702 "ls-alzk/zookeeper*" #sshzk@c6702 "rm/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg" #scp-r/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfgzk@c6702:/home/zk/zk/zookeeper-3.4.6/conf/. #sshzk@c6702 "cat/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg|grep-v'^#'" 根据zoo.cfg创建目录 #sshc6702 "mkdir-p/data/zookeeper/data" #sshc6702 "chown-Rzk:zk/data/zookeeper" #sshc6702 "chown-Rzk:zk/data/zookeeper/data" #sshc6702 "mkdir-p/data/zookeeper/log" #sshc6702 "chown-Rzk:zk/data/zookeeper/log" 创建文件myid,填入2 sshzk@c6702 "echo2>/data/zookeeper/data/myid" 6. 安装c6703的zookeeper 1 2 #sshc6703 "useraddzk" #sshc6703 "echo" zk:zk "|chpasswd" 为zk用户ssh免密 1 ssh-copy-idzk@c6703 拷贝软件 1 2 #scp-r/tmp/software/hadoop-*root@c6703:/tmp/software #sshc6703 "chmod777/tmp/software/*" 创建目录,解压软件 1 2 3 4 5 6 7 #sshzk@c6703 "mkdirzk" #sshzk@c6703 "tar-zxvf/tmp/software/zookeeper-3.4.6.tar.gz-C/home/zk/zk" #sshzk@c6703 "ls-alzk" #sshzk@c6703 "ls-alzk/zookeeper*" #sshzk@c6703 "rm/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg" #scp-r/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfgzk@c6703:/home/zk/zk/zookeeper-3.4.6/conf/. #sshzk@c6703 "cat/home/zk/zk/zookeeper-3.4.6/conf/zoo.cfg|grep-v'^#'" 根据zoo.cfg创建目录 1 2 3 4 5 6 7 #sshc6703 "mkdir-p/data/zookeeper/data" #sshc6703 "chown-Rzk:zk/data/zookeeper" #sshc6703 "chown-Rzk:zk/data/zookeeper/data" #sshc6703 "mkdir-p/data/zookeeper/log" #sshc6703 "chown-Rzk:zk/data/zookeeper/log" 创建文件myid,填入3 sshzk@c6703 "echo3>/data/zookeeper/data/myid" 7. 启动ZK 1 2 /home/zk/zk/zookeeper-3.4.6/bin/zkServer.shstart /home/zk/zk/zookeeper-3.4.6/bin/zkServer.shstatus 远程启动命令 1 2 3 4 sshzk@c6702 "/home/zk/zk/zookeeper-3.4.6/bin/zkServer.sh\start" sshzk@c6703 "/home/zk/zk/zookeeper-3.4.6/bin/zkServer.sh\start" sshzk@c6702 "/home/zk/zk/zookeeper-3.4.6/bin/zkServer.sh\status" sshzk@c6703 "/home/zk/zk/zookeeper-3.4.6/bin/zkServer.sh\status" 8. 没有全部启动三个ZK前的状态 1 2 3 4 [vagrant@c7003bin]$./zkServer.shstatus ZooKeeperJMXenabled by default Usingconfig:/home/vagrant/zookeeper-3.4.10/bin/../conf/zoo.cfg Errorcontactingservice.It is probably not running. 9. 需要将三个ZK全启动之后,才能选举出leader 1 2 3 4 5 6 7 8 $./zkServer.shstatus ZooKeeperJMXenabled by default Usingconfig:/home/vagrant/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode:follower [vagrant@c7002bin]$./zkServer.shstatus ZooKeeperJMXenabled by default Usingconfig:/home/vagrant/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode:leader =======2018.1.13更新========== 客户端连接方式 客户端是随机顺序的方式连接server的,无法指定,或者优先读哪个一个zookeeper server。 ./zkCli.sh -server c6701:2181,c6702:2181,c6703:2181 输出如下: 第一次连接,连接到c6701上 1 2 3 4 5 6 7 8 9 10 11 [zookeeper@c6702bin]$/usr/ local /hadoop/zookeeper-3.4.6/bin/zkCli.sh-serverc6701:2181,c6702:2181,c6703:2181 Connecting to c6701:2181,c6702:2181,c6703:2181 ...... 2018-01-1121:07:30,797[myid:]-INFO[main:ZooKeeper@438]-Initiatingclient connection ,connectString=c6701:2181,c6702:2181,c6703:2181sessionTimeout=30000watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@4b4bc1e Welcome to ZooKeeper! 2018-01-1121:07:30,830[myid:]-INFO[main-SendThread(c6701.python279.org:2181):ClientCnxn$SendThread@975]-Openingsocket connection to serverc6701.python279.org/192.168.67.101:2181.Will not attempt to authenticateusingSASL(unknownerror) JLinesupport is enabled 2018-01-1121:07:30,873[myid:]-INFO[main-SendThread(c6701.python279.org:2181):ClientCnxn$SendThread@852]-Socket connection established to c6701.python279.org/192.168.67.101:2181,initiatingsession 2018-01-1121:07:30,916[myid:]-INFO[main-SendThread(c6701.python279.org:2181):ClientCnxn$SendThread@1235]-Sessionestablishmentcomplete on serverc6701.python279.org/192.168.67.101:2181,sessionid=0x160e70285f70001,negotiatedtimeout=30000 WATCHER:: WatchedEventstate:SyncConnectedtype:Nonepath: null 第二次连接,连接到c6702上 1 2 3 4 5 6 7 8 9 10 Connecting to c6701:2181,c6702:2181,c6703:2181 ...... 2018-01-1121:10:18,442[myid:]-INFO[main:ZooKeeper@438]-Initiatingclient connection ,connectString=c6701:2181,c6702:2181,c6703:2181sessionTimeout=30000watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@4b4bc1e Welcome to ZooKeeper! 2018-01-1121:10:18,489[myid:]-INFO[main-SendThread(c6702.python279.org:2181):ClientCnxn$SendThread@975]-Openingsocket connection to serverc6702.python279.org/192.168.67.102:2181.Will not attempt to authenticateusingSASL(unknownerror) JLinesupport is enabled 2018-01-1121:10:18,508[myid:]-INFO[main-SendThread(c6702.python279.org:2181):ClientCnxn$SendThread@852]-Socket connection established to c6702.python279.org/192.168.67.102:2181,initiatingsession 2018-01-1121:10:18,561[myid:]-INFO[main-SendThread(c6702.python279.org:2181):ClientCnxn$SendThread@1235]-Sessionestablishmentcomplete on serverc6702.python279.org/192.168.67.102:2181,sessionid=0x260e70284650002,negotiatedtimeout=30000 WATCHER:: WatchedEventstate:SyncConnectedtype:Nonepath: null 至此,基础环境部署完。并且zookeeper也安装完成。下面我们会在下一篇文章中,继续安装HDFS。 本文转自 hsbxxl 51CTO博客,原文链接:http://blog.51cto.com/hsbxxl/1971241,如需转载请自行联系原作者