管理hadoop-低调大师

管理hadoop

2017-11-07 681

一、HDFS

永久性数据结构

1.1 namde的目录结构

 
   
     
       
      
         [root@datanode1 name]# cd /data0/hadoop/dfs/name/current/
        

         [root@datanode1 current]# ls
        
 
         edits  edits.
         new  
         fsimage  fstime  VERSION 
        

         [root@datanode1 current]# ls -l
        
 
         总用量 
         56 
        
 
         -rw-rw-r--. 
         1 
         hadoop hadoop     
         789 
         1
         月  
         15 
         16
         :
         59 
         edits 
        
 
         -rw-rw-r--. 
         1 
         hadoop hadoop 
         1049088 
         1
         月  
         15 
         18
         :
         00 
         edits.
         new 
        
 
         -rw-rw-r--. 
         1 
         hadoop hadoop   
         14557 
         1
         月  
         14 
         18
         :
         47 
         fsimage 
        
 
         -rw-rw-r--. 
         1 
         hadoop hadoop       
         8 
         1
         月  
         14 
         18
         :
         47 
         fstime 
        
 
         -rw-rw-r--. 
         1 
         hadoop hadoop     
         100 
         1
         月  
         14 
         18
         :
         47 
         VERSION 
        
 
     

    
  

1.1.2 VERSION文件是一个JAVA属性，包含正在运行的HDFS的版本信息

 
         [root@datanode1 current]# cat VERSION 
        
         #Thu Jan 
         14 
         18
         :
         47
         :
         15 
         CST 
         2016 
        
         namespaceID=
         688384215 
        
         cTime=
         0 
        
         storageType=NAME_NODE
        
         layoutVersion=-
         32

layoutVersion是一个负整数，描述HDFS持久数据结构（也称布局）的版本，但该版本号与hadoop发布包的版本号无关。只要布局变更，版本号便会递增（如，版本号-18之后是-19），此时，HDFS也需升级。否则，磁盘仍然使用旧版本布局，新版本的namenode或datanode无法正常工作。

namespaceIT：是文件系统唯一标识符，是在文件系统首次格式化时设置的。

cTime：标记了namenode存储系统的创建时间。对于刚刚格式化的存储系统，这个属性值为0

storageType：说明该存储目录包含有namenode的数据结构

1.1.3 文件系统映像和编辑日志

文件系统客户端执行写操作时（如创建和移动文件），这些操作，首先被记录到编辑日志中。namenode在内存中维护文件系统的元数据；当编辑日志被修改时，相关元数据信息也需要更新。内存中元数支技客户端的读请求。

每次执行写操作后，且向客户端发送成功代码之前，编辑日志需要更新和同步。当namenode向多个目录写数时，只有在所有写操作均执行完毕之后方可返回代码，以确保任何操作不都不会因为机器故障而丢失。

fsimage：是文件系统元数据一个永久检查点。如果namenode发生故障，可以先把fsimage文件载入到内存重构新近的元数据，再执行编辑日志记录各项操作

fsimage包含文件系统中所有目录和文件inode的序列化信息。每个inode是一个文件或目录的元数据的内存部描述方式。对于文件来说，包含信息有“复本级别”(replication level),修改时间和访问时间，访问许可，块大小，组成一个文件块等；对于目录来说，包含有修改时间，访问许可和配额元数据等信息。

数据块存储在datanode中，但fsimage文件不描述datanode，取而代之的是，namenode将这种块映射关系放在内存中。当datanode加入集群时，namenode向datanode索取块列表以建立映射关系；namnode还将定期征询datanode以确保它拥有最新的块映射。

运行辅助namenode，为主namenode内存中的文件系统元数创建检查点

（1）辅助namenode请求主namenode停止使用edits文件，暂时将新的写操作记录到一个新文件中。

（2）辅助namenode从主namenode获取fsimage和edits文件（采用HTTP GET)

（3）辅助namenode将fsimage文件载入内存，逐一执行edits文件中操作，创建新fsimages文件。

（4）辅助namenode将新fsimage文件发送回主namenode（使用HTTP POST)

（5）主namenode用从辅助namenode接收的fsimage文件替换旧的fsimage文件；用步聚1所产生的edits文件替换旧edits文件。同时，还更新ftime文件来记录检查点执行时间。

创建检查点的触发条件接爱两个配置参数控制。

（1）辅助namenode每隔一小时（由fs.checkpoint.period属性设置，以秒为单位）

（2）当编辑日志大小时到达64MB（由fs.checkpoint.size属性设置，以字节为单位）时，即使未到一小时也会创建检查点。系统每隔5分钟检查一次编辑日志大小。

1.2 辅助namenode的目录结构

 
         [root@slave-two current]# pwd
        
         /data0/hadoop/dfs/data/current
        
         [root@slave-two current]# cat VERSION 
        
         #Fri Jan 
         15 
         15
         :
         34
         :
         22 
         CST 
         2016 
        
         namespaceID=
         688384215 
        
         storageID=DS-
         1030151558
         -
         10.1
         .
         2.216
         -
         50010
         -
         1452481280886 
        
         cTime=
         0 
        
         storageType=DATA_NODE
        
         layoutVersion=-
         32

在主namenode发生故障时（假设没有及时备份，甚至在NFS上也没有），可以从辅助namenode恢复数据。两种实现方法

（1）将相关存储目录复制到新的namenode中

（2）使用-importCheckpoint选项启动namenode守护进程，从面辅助namenode用作新的主namenode。借助该选项，当dfs.name.dir属性定义目录中没有元数据时，辅助namenode就从fs.checkpoint.dir目录截入最新的检查点数据，否则执行失败

1.3 datanode的目录结构

datanode的存储目录是初始阶段自动创建的，不需要额外格式化

 
         [root@slave-one current]# ls 
        
         blk_-
         1342046564177101301            
         blk_3255346014128987307             blk_-
         4378222930931288631            
         blk_7478159877522346339            blk_-
         8475713792677154223 
        
         blk_-1342046564177101301_1004.meta  blk_3255346014128987307_1010.meta   blk_-4378222930931288631_1065.meta  blk_7478159877522346339_1002.meta  blk_-8475713792677154223_1063.meta
        
         blk_-
         1859875086242295767            
         blk_3484901243420393976             blk_5202437766650751967             blk_7579826132350507903            blk_-
         9058686418693604829 
        
         blk_-1859875086242295767_1061.meta  blk_3484901243420393976_1067.meta   blk_5202437766650751967_1072.meta   blk_7579826132350507903_1080.meta  blk_-9058686418693604829_1062.meta
        
         blk_253660519371394588              blk_-
         350256639016866731             
         blk_5450455005443823908             blk_774901497839428573             dncp_block_verification.log.curr 
        
         blk_253660519371394588_1014.meta    blk_-350256639016866731_1077.meta   blk_5450455005443823908_1076.meta   blk_774901497839428573_1068.meta   VERSION
        
         blk_2653614491429524571             blk_-
         4332403947618992681            
         blk_6996247191717220870             blk_7996063171811697628 
        
         blk_2653614491429524571_1066.meta   blk_-4332403947618992681_1012.meta  blk_6996247191717220870_1064.meta   blk_7996063171811697628_1013.meta
        
         [root@slave-one current]# pwd
        
         /data0/hadoop/dfs/data/current
        
         [root@slave-one current]# cat VERSION 
        
         #Fri Jan 
         15 
         15
         :
         34
         :
         16 
         CST 
         2016 
        
         namespaceID=
         688384215 
        
         storageID=DS-
         444750413
         -
         10.1
         .
         2.215
         -
         50010
         -
         1452481260852 
        
         cTime=
         0 
        
         storageType=DATA_NODE
        
         layoutVersion=-
         32

datanode的current目录中的其他文件都有blk_前缀，包括两种文件类型：HDFS块文件（仅有原始数据)和块的元数据（含.meta后缀）。块文件包含所存储文件中一部分的原始数据；元数据文件包括头部（含版本和类型信息）和该块各区段的一系列的校验和

目录数据的数量增加到一定规模时，datanode会创建一个子目录来存放新数据块及元数据信息。如果存储64个（通过dfs.datanode.numblocks属性设置）数据块，就创建一个子目录

如果dfs.data.dir属性指定了不同磁盘上多个目录，那么数据块以轮转(round-robin)方式写到各个目录中。注意，同一个datanode上的每个磁盘上的块不会重复，不同的datanode之间块才可能重复

2. 安全模式

namenode启动时，先将fsimage载入内存，并执行edits中各项操作。一旦内存中成功建立文件系统元数据的映像，则创建一个新的fsimage文件（该操作不需借助辅助namenode）和一个空编辑日志。此时，namenode开始监听RPC和HTTP请求。但此刻，namenode处在安全模式，即namenode的文件系统对于客户端来说是只读的。

进入和离开安全模式

 
         [hadoop@slave-one current]$ hadoop dfsadmin -safemode 
         get 
        
         Safe mode 
         is 
         ON

HDFS的网页面也能显示namenode是否处于安全模式

进入安全模式，使namenode永远处于安全模式方式，将属性dfs.safemode.thresholdpct的值设为大于1

 
         [hadoop@slave-one current]$ hadoop dfsadmin -safemode enter 
        
         Safe mode 
         is 
         ON

离开

 
         [hadoop@slave-one current]$ hadoop dfsadmin -safemode leave
        
         Safe mode 
         is 
         OFF

1.4 工具

1.4.1 dfadmin工具

可查找HDFS状态信息，也可在HDFS上执行管理操作

hadoop dfsadmin

 
         <br>

[hadoop@slave-one current]$ hadoop dfsadmin -help
hadoop dfsadmin is the command to execute DFS administrative commands.
The full syntax is:
hadoop dfsadmin [-report] [-safemode <enter | leave | get | wait>]
[-saveNamespace]
[-refreshNodes]
[-setQuota <quota> <dirname>...<dirname>]
[-clrQuota <dirname>...<dirname>]
[-setSpaceQuota <quota> <dirname>...<dirname>]
[-clrSpaceQuota <dirname>...<dirname>]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[refreshSuperUserGroupsConfiguration]
[-setBalancerBandwidth <bandwidth>]
[-help [cmd]]
-report: Reports basic filesystem information and statistics.
-safemode <enter|leave|get|wait>: Safe mode maintenance command.
Safe mode is a Namenode state in which it
1. does not accept changes to the name space (read-only)
2. does not replicate or delete blocks.
Safe mode is entered automatically at Namenode startup, and
leaves safe mode automatically when the configured minimum
percentage of blocks satisfies the minimum replication
condition. Safe mode can also be entered manually, but then
it can only be turned off manually as well.
-saveNamespace: Save current namespace into storage directories and reset edits log.
Requires superuser permissions and safe mode.
-refreshNodes: Updates the set of hosts allowed to connect to namenode.
Re-reads the config file to update values defined by
dfs.hosts and dfs.host.exclude and reads the
entires (hostnames) in those files.
Each entry not defined in dfs.hosts but in
dfs.hosts.exclude is decommissioned. Each entry defined
in dfs.hosts and also in dfs.host.exclude is stopped from
decommissioning if it has aleady been marked for decommission.
Entires not present in both the lists are decommissioned.
-finalizeUpgrade: Finalize upgrade of HDFS.
Datanodes delete their previous version working directories,
followed by Namenode doing the same.
This completes the upgrade process.
-upgradeProgress <status|details|force>:
request current distributed upgrade status,
a detailed status or force the upgrade to proceed.
-metasave <filename>: Save Namenode's primary data structures
to <filename> in the directory specified by hadoop.log.dir property.
<filename> will contain one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currrently being replicated
4. Blocks waiting to be deleted
-setQuota <quota> <dirname>...<dirname>: Set the quota <quota> for each directory <dirName>.
The directory quota is a long integer that puts a hard limit
on the number of names in the directory tree
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrQuota <dirname>...<dirname>: Clear the quota for each directory <dirName>.
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-setSpaceQuota <quota> <dirname>...<dirname>: Set the disk space quota <quota> for each directory <dirName>.
The space quota is a long integer that puts a hard limit
on the total size of all the files under the directory tree.
The extra space required for replication is also counted. E.g.
a 1GB file with replication of 3 consumes 3GB of the quota.
Quota can also be speciefied with a binary prefix for terabytes,
petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrSpaceQuota <dirname>...<dirname>: Clear the disk space quota for each directory <dirName>.
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-refreshServiceAcl: Reload the service-level authorization policy file
Namenode will reload the authorization policy file.
-refreshUserToGroupsMappings: Refresh user-to-groups mappings
-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
-setBalancerBandwidth <bandwidth>:
Changes the network bandwidth used by each datanode during
HDFS block balancing.
<bandwidth> is the maximum number of bytes per second
that will be used by each datanode. This value overrides
the dfs.balance.bandwidthPerSec parameter.
--- NOTE: The new value is not persistent on the DataNode.---
-help [cmd]: Displays help for the given command or all commands if none
is specified.

Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

-help：显示指定命令帮助，未指明，则显示所有命令的帮助

 
         [hadoop@slave-one current]$ hadoop dfsadmin -safemode -help 
        
         Usage: java DFSAdmin [-safemode enter | leave | 
         get 
         | wait]

-repot：显示文件系统的统计信息（类似在网页界面上显示文件的内容）

-metasave：将某些信息存储到hadoop日志目录中一个文件，包括正在被复制或删除的块信息以及连接的datanode列表

-safamode：改变或查询安全模式

-saveNamespace：将内存中文件系统映像保存到为一个新的fsimage文件，重置edits文件。该操作公在安全模式下执行。

-refreshNodes：更新允许连接到namenode的datanode列表

-upgradeProgress：获取有关HDFS升级的进度信息或强制升级

-finalizeUpgrade：移除datanode和namenode的存储目录上的旧版本数据。这个操作一般在升级完成而且集群在新版本下运行正常情况下执行

-setQuota：设置目录配额，即设置以该目录为根的整个目录树最多包含多少个文件和目录。这项配置能有效阻止用户创建大量小文件，从而保护namenode的内存（文件系统中的所有文件，目录和块的各项信息均存储在内存中）

-clrQuota：清瑼指定的空间配额

-setSpaceQtuota：设置目录空间配客，以限制存储在目录树中的所有文件的总规模。分别为各用户指定有限的存储空间很有必要

-clrSpaceQtuota：清理指定的空间配额

-refreshSserviceAcl：刷新namenode的服务级授权策略文件。

1.4.2 fsck工具

hadoop提供fsck工具来检查HDFS中文件的健康状况。该工具会查找哪些在所有datanode中均缺失的块以及过少或过多复本的块。注意，fsck工具只是从namenode获取信息，并不与任何datanode进行交互，因此并不真正获取块数据

hadoop fsck /

 
         [root@xenserver hadoop6]# xm console hadoop1
        
         WARNING: xend/xm 
         is 
         deprecated. 
        
         PCI: Warning: Cannot find a gap 
         in 
         the 32bit address range 
        
         PCI: Unassigned devices 
         with 
         32bit resource registers may 
         break
         ! 
        
         PCI: Fatal: No config space access 
         function 
         found 
        
         ipmi_si: Could not 
         set 
         up I/O space 
        
         ipmi_si: Could not 
         set 
         up I/O space 
        
         ipmi_si: Could not 
         set 
         up I/O space 
        
         Welcome to CentOS  
        
         Starting udev: [  OK  ]
        
         Setting hostname hadoop1:  [  OK  ]
        
         Setting up Logical Volume Management:   
         3 
         logical volume(s) 
         in 
         volume group 
         "VolGroup" 
         now active 
        
         [  OK  ]
        
         Checking filesystems
        
         Checking all file systems.
        
         [/sbin/fsck.ext4 (
         1
         ) -- /] fsck.ext4 -a /dev/mapper/VolGroup-lv_root  
        
         /dev/mapper/VolGroup-lv_root: clean, 
         62607
         /
         3276800 
         files, 
         1911998
         /
         13107200 
         blocks 
        
         [/sbin/fsck.ext4 (
         1
         ) -- /boot] fsck.ext4 -a /dev/xvda1  
        
         /dev/xvda1: recovering journal
        
         /dev/xvda1: clean, 
         38
         /
         128016 
         files, 
         49624
         /
         512000 
         blocks 
        
         [/sbin/fsck.ext4 (
         1
         ) -- /home] fsck.ext4 -a /dev/mapper/VolGroup-lv_home  
        
         /dev/mapper/VolGroup-lv_home: recovering journal
        
         /dev/mapper/VolGroup-lv_home: clean, 
         8755
         /
         29204480 
         files, 
         1969939
         /
         116811776 
         blocks 
        
         [  OK  ]
        
         Remounting root filesystem 
         in 
         read-write mode:  [  OK  ] 
        
         Mounting local filesystems:  [  OK  ]
        
         Enabling /etc/fstab swaps:  [  OK  ]
        
         Entering non-interactive startup
        
         Starting monitoring 
         for 
         VG VolGroup:   
         3 
         logical volume(s) 
         in 
         volume group 
         "VolGroup" 
         monitored 
        
         [  OK  ]
        
         ip6tables: Applying firewall rules: [  OK  ]
        
         Bringing up loopback 
         interface
         :  [  OK  ] 
        
         Bringing up 
         interface 
         eth0:  Determining 
         if 
         ip address 
         10.1
         .
         2.184 
         is 
         already 
         in 
         use 
         for 
         device eth0... 
        
         [  OK  ]
        
         Starting auditd: [  OK  ]
        
         Starting system logger: [  OK  ]
        
         Mounting filesystems:  [  OK  ]
        
         Retrigger failed udev events[  OK  ]
        
         Starting sshd: [  OK  ]
        
         Starting postfix: [  OK  ]
        
         Starting crond: [  OK  ]
        
         CentOS release 
         6.5 
         (Final) 
        
         Kernel 
         2.6
         .
         32
         -
         431
         .el6.x86_64 on an x86_64 
        
         hadoop1 login: root
        
         Password: 
        
         Last login: Tue Jan 
         19 
         10
         :
         59
         :
         56 
         from 
         10.1
         .
         2.192 
        
         [root@hadoop1 ~]# hadoop fsck /
        
         -bash: hadoop: command not found
        
         [root@hadoop1 ~]# cd /home/hadoop/hadoop-
         1.0
         .
         4
         /bin/ 
        
         [root@hadoop1 bin]# ./hadoop fsck /
        
         FSCK started by root from /
         10.1
         .
         2.184 
         for 
         path / at Tue Jan 
         19 
         20
         :
         59
         :
         55 
         CST 
         2016 
        
         ........................
        
         /data/appstore/chDownloadForPlayer/
         2016
         /
         01
         /
         14
         /
         00
         /output/_logs/history/job_201601082048_0706_conf.xml: CORRUPT block blk_-
         1739242649335851318 
        
         ............................................................................
        
         ...............................................................................................
        
         /data/appstore/chRetainAndFresh/
         2016
         /
         01
         /
         14
         /
         00
         /output/_logs/history/job_201601082048_0707_conf.xml: CORRUPT block blk_5175780252882211574 
        
         .....
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ....................................................................................................
        
         ..................................................................................FSCK ended at Tue Jan 
         19 
         20
         :
         59
         :
         56 
         CST 
         2016 
         in 
         1469 
         milliseconds 
        
         Permission denied: user=root, access=READ_EXECUTE, inode=
         ".staging"
         :hadoop:supergroup:rwx------ 
        
         Fsck on path 
         '/' 
         FAILED

fsck输出文件内容有以下信息

过多的复制块

指复本数超出最小块复本级别的块，严格上讲，并非一个大问题，HDFS会自动删除多余复本
仍需复制的块

指复本数目低于最小复本级别的块。HDFS会自动为这些块创建新的复本，直到达到最小复本级别，用 hadoop dfsadmin -metasave FILE 了解正在复制的（或等待复制的）块的信息
错误复制的块

违反块复本放置策略的块。如，最小复本级为3的多机架集群中，如果一个块的三个复本都存储在一个机加中，则可认定该块的复本放置错误，因为一个块的复本要分散在至少两个机架中，以提高可靠性。
损坏的块

指所有复本均已损坏的块。如果虽然部分复本损坏，但至少还有一个复本完好，则该块就未损坏；namenode将创建新的复本，直到达到最小复本级别。
缺失的复本

指在集群中没有任何复本的块

1.5 均衡器

均衡器（balancer)程序是一个hadoop守护进程，它将块从忙碌的datanode移到相对空闲的datanode，从而重新分配块。

启动均衡器,-threshold参数指定阀值（百分比格式），以判定集群是否均衡，默认10%

 
         start-balancer.sh

三、维护

元数据备份

如果namenode永久性元数据丢失或损坏，则整个文件系统无法使用。备份方法：利用脚本文件定期将辅助namenode的previous.checkpoint子目录存档，放到异地站点。注意该子目录放在fs.checkpoint.dir属性定义的目录之中。
数据备份

distcp是一个理想备份工具，其并行的文件复制功能可将备份文件存储到其他HDFS集群。

3. 添加新节点

3.1 委任新节点

（1）配置hdfs-site.xml文件，指向namenode；

（2）配置mapred.site.xml文件，指向jobtracker

(3) 启动datanode和jobtracker守护进程

注意：被允许连接到namenode的所有datanode放在一个文件中，文件名称由dfs.hosts属性指定。该文件放在namenode的本地文件系统中，每行对应一个datanode的网络地址。如需要为一个datanode指定多个网络地址，可将多个网络地址放在一行，由空格隔开。通常情况下，集群中的节点同时运行datanode和tasktracker守护进程，dfs.hosts和mapred.hosts会同时指向一个文件，即include文件。

3.2 dfs.hosts属性和mapred.hosts属性指定（一个或多个）文件不同于slave文件

前者供namenodet和jobtracker使用，决定可以连接哪个工作节点

后者使用slave文件执行面向整个集群范围的操作。如重启集群等。

3.3 向集群添加新节点步聚

（1）将新点的网络地址添加到include文件中

（2）将审核一系列的datanode集合更新至namenode信息

 
         hadoop dfsadmin -refreshNodes

（3）经过审核的一系列的tasktracker信息更新至jobtracker

 
         hadoop mradmin -refreshNodes

（4）以新节点更新slaves文件。这样的话，hadoop控制脚本会将新节点包括在未来操作之中

（5）启动新的datanode和tasktracker

（6）检查新的datanode和tasktracker是否出现在网页界面中

4. 解除旧节点

4.1 用户将拟退出若干datanode告知namenode,hadoop系统将这些datanode停机之前将块复制到其他datanode

4.2 HDFS的include文件和exclude文件

节点是否出现在include文件中	节点是否出现在exclude文件中	解释
否	否	节点无法连接
否	是	节点无法连接
是	否	节点可连接
是	是	节点可连接，将被解除

4.3 从集群节点移除节点步聚

（1）将待解除节点的网络地址添加到exclude文件中，不更新include文件

（2）使用一组新的审核过的datanode来更新namenode

 
         hadoop dfsadmin -refreshNodes

（3）使用一组新的审核过的datanode来更新jobtracker设置

 
         hadoop mradmin -refreshNodes

（4）转到网页界面，查看待解除datanode的管理状态是否已经变为"正在解除"（Decommission In Progress)。这些datanode会把它们的块复制到其他的datanode中

（5）当所有datanode的状态变为”解除完毕“（Decommissioned)时，表明所有块都已经复制完毕。关闭已经解除的节点

（6）从include文件中移除这些节点

 
         hadoop dfsadmin -refreshNodes

 
         hadoop mradmin -refreshNodes

(7) 从slaves文件中移除节点

本文转自 zouqingyun 51CTO博客，原文链接：http://blog.51cto.com/zouqingyun/1736088，如需转载请自行联系原作者

微信关注我们

原文链接：https://yq.aliyun.com/articles/460732

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

控制Hive MAP个数详解

Hive的MAP数或者说MAPREDUCE的MAP数是由谁来决定的呢？inputsplit size,那么对于每一个inputsplit size是如何计算出来的，这是做MAP数调整的关键. HADOOP给出了Inputformat接口用于描述输入数据的格式，其中一个关键的方法就是getSplits，对输入的数据进行分片. Hive对InputFormat进行了封装：而具体采用的实现是由参数hive.input.format来决定的，主要使用2中类型HiveInputFormat和CombineHiveInputFormat. 对于HiveInputFormat来说： 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException { //扫描每一个分区 for (Path dir : dirs) { PartitionDesc part = getPartitionDescFromPath(pathToP...

2017-11-07

702

一个简单的HQL优化

线上做Job迁移：从GP迁移到Hadoop，发现有些Job原来跑了2-3分钟到Hadoop上跑到10分钟左右，这样的话会影响到迁移的效果；一个明显的Query如下： Insertinto table_big partition(dt=today) select xxx from table_hour_incrementala,table_big b where a.id=b.id and b.dt=yesterday; 查看一下grace: 显然瓶颈集中在第二个MAP上，reduce的shuffle time执行了207秒，计算了300s不到；这个table_big是个外部表，查看一下文件发现是一个250MB左右的gz文件，原因基础上清楚了，在该Job设置了一下mapred.reduce.tasks=8就可以解决了：首先是降低每个reduce的计算时间，其次是today分区的文件增加进而增加MAP数，这个要明天才能看到效果了：P 可以看到每个reduce的计算时间已经降到30秒；同时，产生today分区的文件也是8个30MB的小文件，为接下来增加MAP做好准备本文转自MIKE老毕 ...

2017-11-07

762

发表评论

资源下载

更多资源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

WebStorm

WebStorm 是jetbrains公司旗下一款JavaScript 开发工具。目前已经被广大中国JS开发者誉为“Web前端开发神器”、“最强大的HTML5编辑器”、“最智能的JavaScript IDE”等。与IntelliJ IDEA同源，继承了IntelliJ IDEA强大的JS部分的功能。