您现在的位置是：首页 > 文章详情

Hadoop2.7实战v1.0之start-balancer.sh与hdfs balancer数据均衡

日期：2016-03-08点击：584收藏

Hadoop2.7实战v1.0之start-balancer.sh与hdfs balancer数据均衡【修正版】

适用场景：

a.当动态添加或者删除集群的数据节点，必然会使各节点的数据不均衡

b.当正常维护时

1.对hdfs负载设置均衡，因为默认的数据传输带宽比较低，可以设置为64M，

即hdfs dfsadmin -setBalancerBandwidth 67108864即可

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.7.2/bin
[root@sht-sgmhadoopdn-01 bin]# ./hdfs dfsadmin -setBalancerBandwidth 67108864
Balancer bandwidth is set to 67108864 for sht-sgmhadoopnn-01/172.16.101.55:8020
Balancer bandwidth is set to 67108864 for sht-sgmhadoopnn-02/172.16.101.56:8020

2.默认balancer的threshold为10%，即各个节点存储使用率偏差不超过10%，我们可将其设置为5%;然后启动Balancer,sbin/start-balancer.sh -threshold 5，等待集群自均衡完成即可

点击(此处)折叠或打开

[root@sht-sgmhadoopdn-01 bin]# cd ../sbin
starting balancer, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-balancer-sht-sgmhadoopnn-01.telenav.cn.out
[root@sht-sgmhadoopnn-01 sbin]# ./start-balancer.sh -threshold 5
starting balancer, logging to /hadoop/hadoop-2.7.2/logs/hadoop-root-balancer-sht-sgmhadoopnn-01.telenav.cn.out

###运行这个命令start-balancer.sh -threshold 5和使用hdfs balancer -threshold 5是一样的

#### Usage: hdfs balancer

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 bin]# ./hdfs balancer -threshold 5
16/03/05 18:57:33 INFO balancer.Balancer: Using a threshold of 1.0
16/03/05 18:57:33 INFO balancer.Balancer: namenodes = [hdfs://mycluster]
16/03/05 18:57:33 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
16/03/05 18:57:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
16/03/05 18:57:35 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
16/03/05 18:57:35 INFO balancer.Balancer: 0 over-utilized: []
16/03/05 18:57:35 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Mar 5, 2016 6:57:35 PM 0 0 B 0 B -1 B
Mar 5, 2016 6:57:35 PM Balancing took 2.66 seconds

1).为什么我执行该命令hdfs balancer -threshold 5平衡数据命令没有反应呢？

5表示5% ，

群总存储使用率: 1.74%

sht-sgmhadoopdn-01: 1.74%

sht-sgmhadoopdn-02: 1.74%

sht-sgmhadoopdn-03: 1.74%

sht-sgmhadoopdn-04: 0%

执行-threshold 5, 表示每一个 datanode 存储使用率和集群总存储使用率的差值都应该小于这个阀值5%；

假如超过5%，会执行数据平衡操作。

B. 2).怎样判断执行命令是否会生效，数据平衡操作？

if （群总存储使用率 — 每一台datanode 存储使用率） > -threshold 5

#会执行数据平衡

else

#该命令不生效

end if

C. 3).the threshold range of [1.0, 100.0],所以最小只能设置 -threshold 1

D. 4).balance命令可以执行在namenode或者datanode节点上，最好在新增的或者空闲的数据节点上执行

3. 执行命令hdfs balancer -threshold 1

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 hadoop]# hdfs balancer -threshold 1
……………..
……………..
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
16/03/08 16:08:09 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
16/03/08 16:08:09 INFO balancer.Balancer: 0 over-utilized: []
16/03/08 16:08:09 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Mar 8, 2016 4:08:09 PM 1 382.22 MB 0 B -1 B
Mar 8, 2016 4:08:09 PM Balancing took 6.7001 minutes

###新增数据节点的411.7M,偏差小于1%。

由于历史原因，hadoop集群中的机器的磁盘空间的大小各不相同，而HDFS在进行写入操作时，并没有考虑到这种情况，所以随着数据量的逐渐增加，磁盘较小的datanode机器上的磁盘空间很快将被写满，从而触发了报警。

此时，不得不手工执行start-balancer.sh来进行balance操作，即使将dfs.balance.bandwidthPerSec 参数设置为10M/s，整个集群达到平衡也需要很长的时间，所以写了个crontab来每天凌晨来执行start-balancer.sh，由于此时集群不平衡的状态还没有那么严重，所以start-balancer.sh很快执行结束了。

另外需要注意的地方是，由于HDFS需要启动单独的Rebalance Server来执行Rebalance操作，所以尽量不要在NameNode上执行start-balancer.sh，而是找一台比较空闲的机器。

理论参考：http://www.aboutyun.com/thread-7354-1-1.html

源码解析：

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 sbin]# more start-balancer.sh
#!/usr/bin/env bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
# Start balancer daemon.
"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer $@
解析：start-balancer.sh脚本其实最终还是调用hdfs start balancer $@ 命令,对于 $@ 是指shell脚本运行的传递的参数列表，一般参数为-threshold 5
[root@sht-sgmhadoopnn-01 sbin]# more stop-balancer.sh
#!/usr/bin/env bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
# Stop balancer daemon.
# Run this on the machine where the balancer is running
"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs stop balancer
解析：stop-balancer.sh脚本其实最终还是调用hdfs stop balancer命令
[root@sht-sgmhadoopnn-01 sbin]#