您现在的位置是:首页 > 文章详情

HDFS部署体验

日期:2019-03-01点击:323

目录

  1. 简单说明
  2. 重要的配置参数及配置选择
  3. 部署实践,参数配置修改记录

    1. local machine, NameNode
    2. local machine, DataNode
    3. 192.168.1.101, DataNode
  4. 启动HDFS cluster
  5. 启动验证

    1. hdfs shell创建文件
    2. 问题与解决
  6. 关闭HDFS cluster
  7. 结论

简单说明

  • 下载hadoop distribution
    有三个包: (区别是啥?)

    1. hadoop-x.y.z-site.tar.gz
    2. hadoop-x.y.z-src.tar.gz
    3. hadoop-x.y.z.tar.gz
  • hadoop由不同的组件组成,不同组件有不同的daemon,每个daemon是独立的java process;配置daemon的启动参数,是通过环境变量实现

    • HDFS
      在 etc/hadoop/hadoop-evn.sh 中配置

      • NameNode daemon: HDFS_NAMENODE_OPTS
      • DataNode daemon: HDFS_DATANODE_OPTS
      • Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
    • YARN
      在 etc/hadoop/yarn-evn.sh 中配置

      • ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
      • NodeManager daemon: YARN_NODEMANAGER_OPTS
      • WebAppProxy daemon: YARN_PROXYSERVER_OPTS
    • MapReduce
      在 etc/hadoop/mapred-evn.sh 中配置

      • MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
  • hadoop全局配置,在系统文件(~/.bashrc)中配置

    • HADOOP_HOME: hadoop distribution的家目录,至少要配置
    • HADOOP_PID_DIR
    • HADOOP_LOG_DIR
    • HADOOP_HEAPSIZE_MAX

重要的配置参数及配置选择

  • 所有节点都要配置

    • etc/hadoop/core-site.xml
      示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml

      • fs.defaultFS

        配置HDFS中NameNode的URI
      • io.file.buffer.size
  • NameNode节点配置

    • etc/hadoop/hdfs-site.xml
      示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

      • dfs.namenode.name.dir
      • dfs.hosts / dfs.hosts/excluded
      • dfs.blocksize
      • dfs.namenode.handler.count
  • DataNode节点配置

    • etc/hadoop/hdfs-site.xml

      • dfs.datanode.data.dir

部署实践,参数配置修改记录

local machine, NameNode

  • system环境变量

    export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
  • etc/hadoop/core-size.xml

    • fs.defaultFS

       <property>

      fs.defaultFS
      hdfs://195.90.3.212:9988/
      The name of the default file system. A URI whose
      scheme and authority determine the FileSystem implementation. The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class. The uri's authority is used to
      determine the host, port, etc. for a filesystem.

       </property>
    • io.file.buffer.size

       <property>

      io.file.buffer.size
      4096
      The size of buffer for use in sequence files.
      The size of this buffer should probably be a multiple of hardware
      page size (4096 on Intel x86), and it determines how much data is
      buffered during read and write operations.

       </property>
  • etc/hadoop/hdfs-site.xml

    • dfs.namenode.name.dir

       <property> <name>dfs.namenode.name.dir</name> <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value> <description>Determines where on the local filesystem the DFS name node

      should store the name table(fsimage). If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy.

       </property> 

local machine, DataNode

  • etc/hadoop/hdfs-site.xml

    • dfs.datanode.data.dir

       <property>

      dfs.datanode.data.dir
      file:///home/jng/installed/hadoop/dfs_datanode_data_dir
      Determines where on the local filesystem an DFS data node
      should store its blocks. If this is a comma-delimited
      list of directories, then data will be stored in all named
      directories, typically on different devices. The directories should be tagged
      with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
      storage policies. The default storage type will be DISK if the directory does
      not have a storage type tagged explicitly. Directories that do not exist will
      be created if local filesystem permission allows.

       </property> 

192.168.1.101, DataNode

  • system环境变量

    export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
  • etc/hadoop/core-site.html

    • fs.defaultFS

       <property>

      fs.defaultFS
      hdfs://195.90.3.212:9988/
      The name of the default file system. A URI whose
      scheme and authority determine the FileSystem implementation. The
      uri's scheme determines the config property (fs.SCHEME.impl) naming
      the FileSystem implementation class. The uri's authority is used to
      determine the host, port, etc. for a filesystem.

       </property>
    • io.file.buffer.size

       <property>

      io.file.buffer.size
      4096
      The size of buffer for use in sequence files.
      The size of this buffer should probably be a multiple of hardware
      page size (4096 on Intel x86), and it determines how much data is
      buffered during read and write operations.

       </property>
  • etc/hadoop/hdfs-site.xml

    • dfs.datanode.data.dir

       <property>

      dfs.datanode.data.dir
      file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
      Determines where on the local filesystem an DFS data node
      should store its blocks. If this is a comma-delimited
      list of directories, then data will be stored in all named
      directories, typically on different devices. The directories should be tagged
      with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
      storage policies. The default storage type will be DISK if the directory does
      not have a storage type tagged explicitly. Directories that do not exist will
      be created if local filesystem permission allows.

       </property> 

启动HDFS cluster

  • 首次启动HDFS cluster必须进行格式化

     # 在namenode设备上执行(??) $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>
  • 启动NameNode

     # 在namenode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon start namenode
  • 启动DataNode

     $ $HADOOP_HOME/bin/hdfs --daemon start datanode
  • 可选一键启动

     # 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕 $ $HADOOP_HOME/sbin/start-dfs.sh 

启动验证

hdfs shell创建文件

从本地copy大尺寸文件到hdfs中,查看namenode、datanode的数据存储文件夹大小变化

  • copy文件前

    • local machine as NameNode的dfs.namenode.name.dir路径

       [j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
    • local machine as DataNode的dfs.datanode.data.dir路径

       [j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 44K . [j@j dfs_datanode_data_dir]$
    • 192.168.1.101 DataNode的dfs.datanode.data.dir路径

       m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 44K . m@m:~/installed/hadoop/dfs_datanode_data_dir$
  • copy文件

    # 在NameNode上操作 [j@j hadoop-3.2.0]$ pwd /home/jng/installed/hadoop/hadoop-3.2.0 [j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz -rw-r--r-- 1 jng jng 330M 2月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz [j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/
  • copy文件后

    • local machine as NameNode的dfs.namenode.name.dir路径

       [j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
    • local machine as DataNode的dfs.datanode.data.dir路径

       [j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 333M . [j@j dfs_datanode_data_dir]$
    • 192.168.1.101 as DataNode的dfs.dataanode.data.dir路径

       m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 333M . m@m:~/installed/hadoop/dfs_datanode_data_dir$ 

问题与解决

  • NameNode web UI 上查看 namenode-log 可能发现 WARN 形如:“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“

    ref: <https://blog.csdn.net/qqpy789/article/details/78189335> 修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> <description>

    If true (the default), then the namenode requires that a connecting
    datanode's address must be resolved to a hostname. If necessary, a reverse
    DNS lookup is performed. All attempts to register a datanode from an
    unresolvable address are rejected.

    It is recommended that this setting be left on to prevent accidental
    registration of datanodes listed by hostname in the excludes file during a
    DNS outage. Only set this to false in environments where there is no
    infrastructure to support reverse DNS lookup.

     </description> </property> 

关闭HDFS cluster

  • 关闭NameNode

     # 在NameNode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon stop namenode
  • 关闭DataNode

     $ $HADOOP_HOME/bin/hdfs --daemon stop datanode 

结论

  1. HDFS可以独立与YARN存在并运行

    即,不启动YARN,HDFS也能正常运行,至少通过HDFS shell是这样
  2. HDFS的NameNode设备上可以同时运行一个DataNode
原文链接:https://yq.aliyun.com/articles/692077
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章