Spark学习[一] Spark IntelliJ IDEA 开发环境搭建(Scala项目)- Hello World
1.工具版本说明 Intellij IDE 官网下载安装,此处省略...JDK Version: 1.8.0_151 (提前安装好)Intellij IDE Version: 以下3个无需单独下载安装,在intellij中以插件形式安装:Scala Version: 2.11.0 Spark Version: 2.1.1SBT Version: 0.13.17 2. Scala插件安装 IntelliJ IDEA-> Perferences -> Plugins -> 搜索 Scala安装后如图: 3. 创建Scala项目(SBT) 注意:SBT版本号选择0.13.x版本,不用选择1.0+版本(会报一个idle_shell找不到的异常) 4.SBT资源库配置为国内 默认SBT访问国外资源网站下载资源是超级慢的,将其改为国内镜像在目录(mac电脑) :/Users/lewis.lht/.sbt目录下新建文件repositories,添加内容为(以下内容[]部分不用修改): local alibaba:http://maven.aliyun.com/nexus/content/groups/public/ alibaba-ivy:http://maven.aliyun.com/nexus/content/groups/public/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] repo2:http://repo2.maven.org/maven2/ ivy-typesafe:http://dl.bintray.com/typesafe/ivy-releases, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] ivy-sbt-plugin:http://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] typesafe-releases: http://repo.typesafe.com/typesafe/releases typesafe-ivy-releasez: http://repo.typesafe.com/typesafe/ivy-releases, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] 5.修改src和test目录分别为sources root 不知何原因,我新建的scala项目的src/main/scala和src/test/scala默认非sources root,将其修改为sources root。否则无法新建scala class文件和packagesa) 选中/src/main/scala目录后,右键Make Directory as->Sources Rootb) 选中/src/test/scala目录后,右键Make Directory as->Test Sources Root 6.在sbt配置文件build.sbt中,添加Spark依赖 build.sbt name := "sparkexcercise" version := "0.1" scalaVersion := "2.11.0" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.1" 如图示: 7.sbt执行编译 打开SBT Projects面板,选择刷新按钮(Refresh all SBT Projects)或者打开SBT SHELL窗口,执行compile命令,完成依赖包下载和项目编译编译成功后如下图: 8.新建Spark Hello Word Object,并复制下面代码: package org.lewis import org.apache.spark.sql.SparkSession object HelloWorld { def main(args: Array[String]): Unit = { if (args.size == 0) { println("input file path !") System.exit(1) } val fpath = args(0) val spark = SparkSession .builder .appName("HdfsHelloWorld") .getOrCreate() val file = spark.read.textFile(fpath).rdd //flatMap将一个输入元素rdd,转为多个元素rdd val m = file.flatMap(line => line.split(" ")) //为每个word,赋值其个数为1,转为pair rdd val g = m.map((_, 1)) //val g = m.map(word=>(word,1)) // 等价于这种写法,_代表当前元素 //按照word为key进行reduce,并对其value累加,(x,y)分别为上一个元素value和当前元素的value val r = g.reduceByKey(_ + _) //val r = g.reduceByKey((x,y)=>x+y) // 等价于这种写法,_代表当前元素 r.collect().foreach(println) //上面也可以全部简写为: //file.flatMap(line=>line.split(" ")).map((_,1)).reduceByKey(_+_).collect().foreach(println) } } 9.配置Run Config //采用local本地模式执行,确保你的SparkOverview.txt文件存在 VM options: -Dspark.master=local Program arguments: /Users/lewis.lht/Downloads/SparkOverview.txt 9.执行 仅演示了local模式运行,故不需要spark集群环境