HDFS部署体验
目录
简单说明
-
下载hadoop distribution
有三个包: (区别是啥?)- hadoop-x.y.z-site.tar.gz
- hadoop-x.y.z-src.tar.gz
- hadoop-x.y.z.tar.gz
-
hadoop由不同的组件组成,不同组件有不同的daemon,每个daemon是独立的java process;配置daemon的启动参数,是通过环境变量实现
-
HDFS
在 etc/hadoop/hadoop-evn.sh 中配置- NameNode daemon: HDFS_NAMENODE_OPTS
- DataNode daemon: HDFS_DATANODE_OPTS
- Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
-
YARN
在 etc/hadoop/yarn-evn.sh 中配置- ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
- NodeManager daemon: YARN_NODEMANAGER_OPTS
- WebAppProxy daemon: YARN_PROXYSERVER_OPTS
-
MapReduce
在 etc/hadoop/mapred-evn.sh 中配置- MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
-
-
hadoop全局配置,在系统文件(~/.bashrc)中配置
- HADOOP_HOME: hadoop distribution的家目录,至少要配置
- HADOOP_PID_DIR
- HADOOP_LOG_DIR
- HADOOP_HEAPSIZE_MAX
重要的配置参数及配置选择
-
所有节点都要配置
-
etc/hadoop/core-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml-
fs.defaultFS
配置HDFS中NameNode的URI
- io.file.buffer.size
-
-
-
NameNode节点配置
-
etc/hadoop/hdfs-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml- dfs.namenode.name.dir
- dfs.hosts / dfs.hosts/excluded
- dfs.blocksize
- dfs.namenode.handler.count
-
-
DataNode节点配置
-
etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
-
部署实践,参数配置修改记录
local machine, NameNode
-
system环境变量
export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
-
etc/hadoop/core-size.xml
-
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</property>
-
io.file.buffer.size
<property>
io.file.buffer.size
4096
The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</property>
-
-
etc/hadoop/hdfs-site.xml
-
dfs.namenode.name.dir
<property> <name>dfs.namenode.name.dir</name> <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value> <description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.</property>
-
local machine, DataNode
-
etc/hadoop/hdfs-site.xml
-
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/jng/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.</property>
-
192.168.1.101, DataNode
-
system环境变量
export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
-
etc/hadoop/core-site.html
-
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</property>
-
io.file.buffer.size
<property>
io.file.buffer.size
4096
The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</property>
-
-
etc/hadoop/hdfs-site.xml
-
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.</property>
-
启动HDFS cluster
-
首次启动HDFS cluster必须进行格式化
# 在namenode设备上执行(??) $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>
-
启动NameNode
# 在namenode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon start namenode
-
启动DataNode
$ $HADOOP_HOME/bin/hdfs --daemon start datanode
-
可选一键启动
# 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕 $ $HADOOP_HOME/sbin/start-dfs.sh
启动验证
- 查看NameNode的web ui: http://ip:port default port is: 9870
- 查看DataNode的web ui: http://ip:port default port is: 9864
hdfs shell创建文件
从本地copy大尺寸文件到hdfs中,查看namenode、datanode的数据存储文件夹大小变化
-
copy文件前
-
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
-
local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 44K . [j@j dfs_datanode_data_dir]$
-
192.168.1.101 DataNode的dfs.datanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 44K . m@m:~/installed/hadoop/dfs_datanode_data_dir$
-
-
copy文件
# 在NameNode上操作 [j@j hadoop-3.2.0]$ pwd /home/jng/installed/hadoop/hadoop-3.2.0 [j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz -rw-r--r-- 1 jng jng 330M 2月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz [j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/
-
copy文件后
-
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
-
local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 333M . [j@j dfs_datanode_data_dir]$
-
192.168.1.101 as DataNode的dfs.dataanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 333M . m@m:~/installed/hadoop/dfs_datanode_data_dir$
-
问题与解决
-
NameNode web UI 上查看 namenode-log 可能发现 WARN 形如:“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“
ref: <https://blog.csdn.net/qqpy789/article/details/78189335> 修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> <description>
If true (the default), then the namenode requires that a connecting
datanode's address must be resolved to a hostname. If necessary, a reverse
DNS lookup is performed. All attempts to register a datanode from an
unresolvable address are rejected.It is recommended that this setting be left on to prevent accidental
registration of datanodes listed by hostname in the excludes file during a
DNS outage. Only set this to false in environments where there is no
infrastructure to support reverse DNS lookup.</description> </property>
关闭HDFS cluster
-
关闭NameNode
# 在NameNode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon stop namenode
-
关闭DataNode
$ $HADOOP_HOME/bin/hdfs --daemon stop datanode
结论
-
HDFS可以独立与YARN存在并运行
即,不启动YARN,HDFS也能正常运行,至少通过HDFS shell是这样
- HDFS的NameNode设备上可以同时运行一个DataNode
低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
Phoenix ODPSBulkLoadTool 使用案例
1.创建ODPS表 create table hbaseport.odps_test ( key string, value1 string, value2 bigint); 2.配置MR集群访问云HBASE环境 开通云 HBase HDFS 端口 配置hdfs-site.xml使之能访问云HBASE HA的hdfs, 具体参考这里 配置hbase-site.xml文件可访问云HBASE 在MR集群上创建临时conf目录, 执行hadoop命或者yarn命令时通过--config选项添加到命令运行时的classpath中,目录中包括如下: ls conf/ core-site.xml hbase-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml 3.创建Phoenix测试表 DROP
- 下一篇
一文快速了解MaxCompute
一文快速了解MaxCompute 很多刚初次接触MaxCompute的用户,面对繁多的产品文档内容以及社区文章,往往很难快速、全面了解MaxCompute产品全貌。同时,很多拥有大数据开发经验的开发者,也希望能够结合自身的背景知识,将MaxCompute产品能力与开源项目、商业软件之间建立某种关联和映射,以快速寻找或判断MaxCompute是否满足自身的需要,并结合相关经验更轻松地学习和使用产品。 本文将站在一个更宏观的视角来分主题地介绍MaxCompute产品,以期读者能够通过本文快速获取对MaxCompute产品的认识。 概念篇 产品名称:大数据计算服务(英文名:MaxCompute) 产品说明:MaxCompute(原ODPS)是一项大数据计算服务,它能提供快速、完全托管的PB级数据仓库解决方案,使您可以经济并高效的分析处理海量数据。
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- Docker使用Oracle官方镜像安装(12C,18C,19C)
- Docker快速安装Oracle11G,搭建oracle11g学习环境
- Linux系统CentOS6、CentOS7手动修改IP地址
- CentOS8,CentOS7,CentOS6编译安装Redis5.0.7
- SpringBoot2整合Thymeleaf,官方推荐html解决方案
- CentOS关闭SELinux安全模块
- Eclipse初始化配置,告别卡顿、闪退、编译时间过长
- SpringBoot2整合Redis,开启缓存,提高访问速度
- CentOS8安装MyCat,轻松搞定数据库的读写分离、垂直分库、水平分库
- Hadoop3单机部署,实现最简伪集群