(Hadoop+Spark)集群环境搭建
环境准备:
在虚拟机下,大家三台Linux ubuntu 14.04 server x64 系统(下载地址:http://releases.ubuntu.com/14.04.2/ubuntu-14.04.2-server-amd64.iso):
192.168.1.200 master
192.168.1.201 node1
192.168.1.202 node2
在Master上安装Spark环境:
Spark集群环境搭建:
Spark+hadoop搭建集群使用hadoop版本是hadoop2.6.4
搭建spark这里使用spark版本是spark1.6.2(spark-1.6.2-bin-hadoop2.6.tgz)
1、下载安装包到master虚拟服务器:
在线下载:
hadoop@master:~/ wget http://mirror.bit.edu.cn/apache/spark/spark-1.6.2/spark-1.6.2-bin-hadoop2.6.tgz
离线上传到集群:
2、解压spark安装包到master虚拟服务器/usr/local/spark下,并分配权限:
#解压到/usr/local/下 hadoop@master:~$ sudo tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz -C /usr/local/ hadoop@master:~$ cd /usr/local/ hadoop@master:/usr/local$ ls bin games include man share src etc hadoop lib sbin spark-1.6.2-bin-hadoop2.6 #重命名 为spark hadoop@master:/usr/local$ sudo mv spark-1.6.2-bin-hadoop2.6/ spark/ hadoop@master:/usr/local$ ls bin etc games hadoop include lib man sbin share spark src #分配权限 hadoop@master:/usr/local$ sudo chown -R hadoop:hadoop spark hadoop@master:/usr/local$
3、在master虚拟服务器/etc/profile中添加Spark环境变量:
编辑/etc/profile文件
sudo vim /etc/profile
在尾部添加$SPARK_HOME变量,添加后,目前我的/etc/profile文件尾部内容如下:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle export JRE_HOME=/usr/lib/jvm/java-8-oracle export SCALA_HOME=/opt/scala/scala-2.10.5 # add hadoop bin/ directory to PATH export HADOOP_HOME=/usr/local/hadoop export SPARK_HOME=/usr/local/spark export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH export CLASSPATH=$CLASS_PATH::$JAVA_HOME/lib:$JAVA_HOME/jre/lib
生效:
source /etc/profile
在Master配置Spark:
1、配置master虚拟服务器hadoop-env.sh文件:
sudo vim /usr/local/spark/conf/hadoop-env.sh
注意:默认情况下没有hadoop-env.sh和slaves文件,而是*.template文件,需要重命名:
hadoop@master:/usr/local/spark/conf$ ls docker.properties.template metrics.properties.template spark-env.sh fairscheduler.xml.template slaves.template log4j.properties.template spark-defaults.conf.template hadoop@master:/usr/local/spark/conf$ sudo vim spark-env.sh hadoop@master:/usr/local/spark/conf$ mv slaves.template slaves
在文件末尾追加如下内容:
export STANDALONE_SPARK_MASTER_HOST=192.168.1.200 export SPARK_MASTER_IP=192.168.1.200 export SPARK_WORKER_CORES=1 #every slave node start work instance count export SPARK_WORKER_INSTANCES=1 export SPARK_MASTER_PORT=7077 export SPARK_WORKER_MEMORY=1g export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT} export SCALA_HOME=/opt/scala/scala-2.10.5 export JAVA_HOME=/usr/lib/jvm/java-8-oracle export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://192.168.1.200:9000/SparkEventLog" export SPARK_WORKDER_OPTS="-Dspark.worker.cleanup.enabled=true" export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
2、配置master虚拟服务器下slaves文件:
sudo vim /usr/local/spark/conf/slaves
在slaves文件中内容如下:
192.168.1.201 192.168.1.202
注意:每行写一个机器的ip。
3、Master虚拟机下/usr/local/spark/目录下创建logs文件夹,并分配777权限:
hadoop@master:/usr/local/spark$ mkdir logs hadoop@master:/usr/local/spark$ chmod 777 logs
复制Master虚拟服务器上的/usr/loca/spark下文件到所有slaves节点(node1、node2)下:
1、复制Master虚拟服务器上/usr/local/spark/安装文件到各个salves(node1、node2)上:
注意:拷贝钱需要在ssh到所有salves节点(node1、node2)上,创建/usr/local/spark/目录,并分配777权限。
hadoop@master:/usr/local/spark/conf$ cd ~/ hadoop@master:~$ sudo chmod 777 /usr/local/spark hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local scp: /usr/local/spark: Permission denied hadoop@master:~$ sudo scp -r /usr/local/spark hadoop@node1:/usr/local hadoop@node1's password: scp: /usr/local/spark: Permission denied hadoop@master:~$ sudo chmod 777 /usr/local/spark hadoop@master:~$ ssh node1 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:40:31 UTC 2016 System load: 0.08 Processes: 400 Usage of /: 12.2% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it. Last login: Wed Sep 21 16:19:25 2016 from master hadoop@node1:~$ cd /usr/local/ hadoop@node1:/usr/local$ sudo mkdir spark [sudo] password for hadoop: hadoop@node1:/usr/local$ ls bin etc games hadoop include lib man sbin share spark src hadoop@node1:/usr/local$ sudo chmod 777 ./spark hadoop@node1:/usr/local$ exit hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local ........... hadoop@master:~$ ssh node2 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:15:03 UTC 2016 System load: 0.08 Processes: 435 Usage of /: 13.0% of 17.34GB Users logged in: 0 Memory usage: 6% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ Last login: Wed Sep 21 16:19:47 2016 from master hadoop@node2:~$ cd /usr/local hadoop@node2:/usr/local$ sudo mkdir spark [sudo] password for hadoop: hadoop@node2:/usr/local$ sudo chmod 777 ./spark hadoop@node2:/usr/local$ exit logout Connection to node2 closed. hadoop@master:~$ scp -r /usr/local/spark hadoop@node2:/usr/local ...........
2、修改所有salves节点(node1、node2)上/etc/profile,追加$SPARK_HOME环境变量:
注意:一般都会遇到权限问题。最好登录到各个salves节点(node1、node2)上手动编辑/etc/profile。
hadoop@master:~$ ssh node1 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:42:44 UTC 2016 System load: 0.01 Processes: 400 Usage of /: 12.2% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 16:40:52 2016 from master hadoop@node1:~$ sudo vim /etc/profile [sudo] password for hadoop: hadoop@node1:~$ exit logout Connection to node1 closed. hadoop@master:~$ ssh node2 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:44:42 UTC 2016 System load: 0.0 Processes: 400 Usage of /: 13.0% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 16:43:31 2016 from master hadoop@node2:~$ sudo vim /etc/profile [sudo] password for hadoop: hadoop@node2:~$ exit logout Connection to node2 closed. hadoop@master:~$
修改后的所有salves上/etc/profile文件与master节点上/etc/profile文件配置一致。
在Master启动spark并验证是否配置成功:
1、启动命令:
一般要确保hadoop已经启动,之后才启动spark
hadoop@master:~$ cd /usr/local/spark/ hadoop@master:/usr/local/spark$ ./sbin/start-all.sh
2、验证是否启动成功:
方法一、jps
hadoop@master:/usr/local/spark$ ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-master.out 192.168.1.201: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node1.out 192.168.1.202: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out hadoop@master:/usr/local/spark$ jps 1650 NameNode 1875 SecondaryNameNode 3494 Jps 2025 ResourceManager 3423 Master hadoop@master:/usr/local/spark$ cd ~/ hadoop@master:~$ ssh node1 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 17:33:40 UTC 2016 System load: 0.06 Processes: 402 Usage of /: 13.9% of 17.34GB Users logged in: 0 Memory usage: 21% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 17:33:10 2016 from master hadoop@node1:~$ jps 1392 DataNode 2449 Jps 2330 Worker 2079 NodeManager hadoop@node1:~$ exit logout Connection to node1 closed. hadoop@master:~$ ssh node2 Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 17:33:40 UTC 2016 System load: 0.07 Processes: 404 Usage of /: 14.7% of 17.34GB Users logged in: 0 Memory usage: 20% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 16:51:36 2016 from master hadoop@node2:~$ jps 2264 Worker 2090 NodeManager 1402 DataNode 2383 Jps hadoop@node2:~$
方法二、web方式http://192.168.1.200:8080看是否正常:

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
JavaWeb三大组件之监听器(Listener)
监听器是一个专门用于对其他对象身上发生的事件或状态改变进行监听和相应处理的对象,当被监视的对象发生情况时,立即采取相应的行动。监听器其实就是一个实现特定接口的普通java程序,这个程序专门用于监听另一个java对象的方法调用或属性改变,当被监听对象发生上述事件后,监听器某个方法立即被执行。 上述概念设计到3个名词概念: 1.事件源:即谁产生的事件 2.事件对象:即产生了什么事件 3.监听器:监听事件源的动作 由于事件源可以产生多个动作(即产生多个事件),而监听器中的每一个方法监听一个动作,故每个监听器中都有很多方法。 1.JavaWeb中的监听器 1.1概念 JavaWeb中的监听器是Servlet规范中定义的一种特殊类,它用于监听web应用程序中的ServletContext、HttpSession和 ServletRequest这三大域对象的创建、销毁事件以及监听这些域对象中的属性发生修改的事件。 1.2JavaWeb中监听器的分类 在Servlet规范中定义了多种类型的监听器(一共8个监听器),它们用于监听的事件源分别为ServletContext,HttpSession和Ser...
- 下一篇
Java多线程和并发基础面试问答,错过会后悔哦【转】
第一:Java多线程面试问题 1:进程和线程之间有什么不同? 一个进程是一个独立(self contained)的运行环境,它可以被看作一个程序或者一个应用。而线程是在进程中执行的一个任务。Java运行环境是一个包含了不同的类和程序的单一进程。线程可以被称为轻量级进程。线程需要较少的资源来创建和驻留在进程中,并且可以共享进程中的资源。 2:多线程编程的好处是什么? 在多线程程序中,多个线程被并发的执行以提高程序的效率,CPU不会因为某个线程需要等待资源而进入空闲状态。多个线程共享堆内存(heap memory),因此创建多个线程去执行一些任务会比创建多个进程更好。举个例子,Servlets比CGI更好,是因为Servlets支持多线程而CGI不支持。 3:用户线程和守护线程有什么区别? 当我们在Java程序中创建一个线程,它就被称为用户线程。一个守护线程是在后台执行并且不会阻止JVM终止的线程。当没有用户线程在运行的时候,JVM关闭程序并且退出。一个守护线程创建的子线程依然是守护线程。 4:我们如何创建一个线程? 有两种创建线程的方法:一是实现Runnable接口,然后将它传递给Thr...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- CentOS8编译安装MySQL8.0.19
- SpringBoot2配置默认Tomcat设置,开启更多高级功能
- Docker使用Oracle官方镜像安装(12C,18C,19C)
- Docker安装Oracle12C,快速搭建Oracle学习环境
- CentOS7设置SWAP分区,小内存服务器的救世主
- Linux系统CentOS6、CentOS7手动修改IP地址
- CentOS6,7,8上安装Nginx,支持https2.0的开启
- SpringBoot2初体验,简单认识spring boot2并且搭建基础工程
- Hadoop3单机部署,实现最简伪集群
- Springboot2将连接池hikari替换为druid,体验最强大的数据库连接池