前提条件
(1)Ubuntu操作系统(本教程使用的是Ubuntu 14.04)
(2)安装JDK
$ sudo apt-get install openjdk-7-jdk
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk
(3)安装ssh
$ sudo apt-get install openssh-server
添加Hadoop用户组和用户(可选)
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo
创建用户之后,使用hduser重新登陆ubuntu
安装SSH证书
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
下载Hadoop 2.6.2
$ cd ~
$ wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz
$ sudo tar vxzf hadoop-2.6.2.tar.gz -C /home/hduser
$ cd /home/hduser
$ sudo mv hadoop-2.6.2 hadoop
$ sudo chown -R hduser:hadoop hadoop
配置Hadoop环境变量
(1)修改系统环境变量
$cd ~
$vi .bashrc
把下边的代码复制到vi打开的.bashrc文件末尾,如果JAVA_HOME已经配置过,那就不需要再配置了。
#Hadoop variables
#begin of paste
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/home/hduser/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of paste
(2)修改hadoop环境变量
$ cd /home/hduser/hadoop/etc/hadoop
$ vi hadoop-env.sh
#必改的就一个,那就是修改JAVA_HOME,其他的可以不修改
export JAVA_HOME=/usr/lib/jvm/jdk/
配置完成后,重新登陆Ubuntu(把terminal关掉,再打开)
输入下边的命令检查是否安装成功
$ hadoop version
Hadoop 2.6.2
...
...
...
配置Hadoop
(1)core-site.xml
$ cd /home/hduser/hadoop/etc/hadoop
$ vi core-site.xml
#把下边的代码复制到<configuration>和</configuration>中间
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
(2)yarn-site.xml
$ vi yarn-site.xml
#把下边的代码复制到<configuration>和</configuration>中间
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
(3)mapred-site.xml
$ mv mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
#把下边的代码复制到<configuration>和</configuration>中间
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
(4)hdfs-site.xml
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /home/hduser/hadoop/etc/hadoop
$ vi hdfs-site.xml
#把下边的代码复制到<configuration>和</configuration>中间
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
格式化一个新的分布式文件系统:
$ cd ~
$ hdfs namenode -format
启动Hadoop服务
$ start-dfs.sh
....
$ start-yarn.sh
....
$ jps
#如果配置成功的话,你会看到类似下边的信息
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
运行Hadoop示例
hduser@ubuntu: cd /home/dhuser/hadoop
hduser@ubuntu:/home/dhuser/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar pi 2 5
#然后你会看到类似下边的信息
Number of Maps = 2
Samples per Map = 5
15/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
15/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2
15/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
15/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
...
本文转自ZH奶酪博客园博客,原文链接:http://www.cnblogs.com/CheeseZH/p/5051135.html,如需转载请自行联系原作者