记录在64位CentOS 6.5环境下搭建Hadoop 2.5.2集群的步骤,同时遇到问题的解决办法,这些记录都仅供参考!
1、操作系统环境配置
1.1、操作系统环境
| 主机名 |
IP地址 |
角色 |
Hadoop用户 |
| hadoop-master |
192.168.30.50 |
Hadoop主节点 |
hadoop |
| hadoop-slave01 |
192.168.30.51 |
Hadoop从节点 |
hadoop |
| hadoop-slave02 |
192.168.30.52 |
Hadoop从节点 |
hadoop |
1.2、关闭防火墙和SELinux
1.2.1、关闭防火墙
service iptables stop
chkconfig iptables off
1.2.2、关闭SELinux
setenforce 0
sed -i 's/enforcing/disabled/' /etc/sysconfig/selinux
1.3、hosts配置
vim /etc/hosts
########## Hadoop host ##########
192.168.30.50 hadoop-master
192.168.30.51 hadoop-slave01
192.168.30.52 hadoop-slave02
注:以上操作需要在root用户,通过ping 主机名可以返回对应的IP即可
1.4、配置无密码访问
在3台主机上使用hadoop用户配置无密码访问,所有主机的操作相同,以hadoop-master为例
生成私钥和公钥
ssh-keygen -t rsa
拷贝公钥到主机(需要输入密码)
ssh-copy-id hadoop@hadoop-master
ssh-copy-id hadoop@hadoop-slave01
ssh-copy-id hadoop@hadoop-slave02
注:以上操作需要在hadoop用户,通过hadoop用户ssh到其他主机不需要密码即可。
2、Java环境配置
2.1、下载JDK
mkdir -p /home/hadoop/app/java
cd /home/hadoop/app/java
wget -c http://download.oracle.com/otn/java/jdk/6u45-b06/jdk-6u45-linux-x64.bin
2.2、安装java
cd /home/hadoop/app/java
chmod +x jdk-6u45-linux-x64.bin
./jdk-6u45-linux-x64.bin
2.3、配置Java环境变量
vim .bash_profile
export JAVA_HOME=/home/hadoop/app/java/jdk1.6.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
启用环境变量
source .bash_profile
注:使用hadoop用户在所有机器安装jdk,通过java –version命令返回Java的版本信息即可
3、Hadoop安装配置
hadoop的安装配置使用hadoop用户操作;
3.1、安装Hadoop
-
下载hadoop 1.2.1
mkdir -p /home/hadoop/app/hadoop
cd /home/hadoop/app/hadoop
wget -c https://archive.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz
tar -zxf hadoop-1.2.1-bin.tar.gz
- 创建hadoop临时文件目录
mkdir -p /home/hadoop/app/hadoop/hadoop-1.2.1/tmp
3.2、配置Hadoop
Hadoop配置文件都是XML文件,使用hadoop用户操作即可;
3.2.1、配置core-site.xml
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop/hadoop-1.2.1/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.30.50:9000</value>
</property>
</configuration>
3.2.2、配置hdfs-site.xml
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
3.2.3、配置mapred-site.xml
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.30.50:9001</value>
</property>
</configuration>
3.2.4、配置master
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/masters
hadoop-master
3.2.5、配置slaves
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/slaves
hadoop-slave01
hadoop-slave02
3.2.6、配置hadoop-env.xml
vim /home/hadoop/app/hadoop/hadoop-1.2.1/conf/hadoop-env.sh
将JAVA_HOME修改为如下:
export JAVA_HOME=/home/hadoop/app/java/jdk1.6.0_45
3.3、拷贝Hadoop程序到slave
scp -r app hadoop-slave01:/home/hadoop/
scp -r app hadoop-slave02:/home/hadoop/
3.4、配置Hadoop环境变量
在所有机器hadoop用户家目录下编辑 .bash_profile 文件,在最后追加:
vim /home/hadoop/.bash_profile
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=/home/hadoop/app/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
让环境变量生效:
source /home/hadoop/.bash_profile
3.5、启动Hadoop
在hadoop主节点上初始化HDFS文件系统,然后启动hadoop集群
3.5.1、初始化HDFS文件系统
hadoop namenode –format
3.5.2、启动和关闭Hadoop集群
-
启动:
start-all.sh
- 关闭:
stop-all.sh
浏览NameNode和JobTracker的网络接口,它们的地址默认为:
3.5.3、hadoop各节点的启动进程
3.5.4、hadoop启动后验证
hadoop fs -ls hdfs:/
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2018-01-09 16:15 /home
drwxr-xr-x - hadoop supergroup 0 2018-01-10 10:39 /user
- 简单的MapReduce计算
hadoop jar /home/hadoop/app/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar pi 10 10
得到的计算结果是:
Number of Maps = 10
Samples per Map = 10
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Wrote input for Map
Starting Job
18/01/10 13:49:35 INFO mapred.FileInputFormat: Total input paths to process : 10
18/01/10 13:49:36 INFO mapred.JobClient: Running job: job_201801101031_0002
18/01/10 13:49:37 INFO mapred.JobClient: map 0% reduce 0%
18/01/10 13:49:49 INFO mapred.JobClient: map 10% reduce 0%
18/01/10 13:49:50 INFO mapred.JobClient: map 30% reduce 0%
18/01/10 13:49:51 INFO mapred.JobClient: map 40% reduce 0%
18/01/10 13:49:59 INFO mapred.JobClient: map 50% reduce 0%
18/01/10 13:50:00 INFO mapred.JobClient: map 60% reduce 0%
18/01/10 13:50:02 INFO mapred.JobClient: map 80% reduce 0%
18/01/10 13:50:07 INFO mapred.JobClient: map 100% reduce 0%
18/01/10 13:50:12 INFO mapred.JobClient: map 100% reduce 33%
18/01/10 13:50:14 INFO mapred.JobClient: map 100% reduce 100%
18/01/10 13:50:16 INFO mapred.JobClient: Job complete: job_201801101031_0002
18/01/10 13:50:16 INFO mapred.JobClient: Counters: 30
18/01/10 13:50:16 INFO mapred.JobClient: Job Counters
18/01/10 13:50:16 INFO mapred.JobClient: Launched reduce tasks=1
18/01/10 13:50:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=95070
18/01/10 13:50:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
18/01/10 13:50:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
18/01/10 13:50:16 INFO mapred.JobClient: Launched map tasks=10
18/01/10 13:50:16 INFO mapred.JobClient: Data-local map tasks=10
18/01/10 13:50:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=25054
18/01/10 13:50:16 INFO mapred.JobClient: File Input Format Counters
18/01/10 13:50:16 INFO mapred.JobClient: Bytes Read=1180
18/01/10 13:50:16 INFO mapred.JobClient: File Output Format Counters
18/01/10 13:50:16 INFO mapred.JobClient: Bytes Written=97
18/01/10 13:50:16 INFO mapred.JobClient: FileSystemCounters
18/01/10 13:50:16 INFO mapred.JobClient: FILE_BYTES_READ=226
18/01/10 13:50:16 INFO mapred.JobClient: HDFS_BYTES_READ=2450
18/01/10 13:50:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=682653
18/01/10 13:50:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
18/01/10 13:50:16 INFO mapred.JobClient: Map-Reduce Framework
18/01/10 13:50:16 INFO mapred.JobClient: Map output materialized bytes=280
18/01/10 13:50:16 INFO mapred.JobClient: Map input records=10
18/01/10 13:50:16 INFO mapred.JobClient: Reduce shuffle bytes=280
18/01/10 13:50:16 INFO mapred.JobClient: Spilled Records=40
18/01/10 13:50:16 INFO mapred.JobClient: Map output bytes=180
18/01/10 13:50:16 INFO mapred.JobClient: Total committed heap usage (bytes)=1146068992
18/01/10 13:50:16 INFO mapred.JobClient: CPU time spent (ms)=7050
18/01/10 13:50:16 INFO mapred.JobClient: Map input bytes=240
18/01/10 13:50:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=1270
18/01/10 13:50:16 INFO mapred.JobClient: Combine input records=0
18/01/10 13:50:16 INFO mapred.JobClient: Reduce input records=20
18/01/10 13:50:16 INFO mapred.JobClient: Reduce input groups=20
18/01/10 13:50:16 INFO mapred.JobClient: Combine output records=0
18/01/10 13:50:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=1843138560
18/01/10 13:50:16 INFO mapred.JobClient: Reduce output records=0
18/01/10 13:50:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=7827865600
18/01/10 13:50:16 INFO mapred.JobClient: Map output records=20
Job Finished in 41.091 seconds
Estimated value of Pi is 3.20000000000000000000
hadoop搭建完成,如果有错误,可以查看日志信息。
4、参考资料
本文转自 巴利奇 51CTO博客,原文链接:http://blog.51cto.com/balich/2059402