从 "No module named pyspark" 到远程提交 spark 任务
能在本地Mac环境用python提交spark 任务会方便很多,但是在安装了 spark-1.6-bin-without-hadoop (spark.apache.org/download) 之后,在python 中 “import pyspark” 会报“no module named pyspark” 错误。 没错,这种错误都是 路径问题。
为了本地使用spark,需要在~/.bash_profile 中增加两个环境变量:SPARK_HOME 以及必知的PYTHONPATH
export SPARK_HOME=/Users/abc/Documents/
spark-1.6.0-bin-without-hadoop #这是spark 的安装路径
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
$ pyspark
Python 2.7.11 (default, Mar 1 2016, 18:40:10)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
16/04/16 21:41:02 INFO spark.SparkContext: Running Spark version 1.6.0
16/04/16 21:41:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/16 21:41:05 INFO spark.SecurityManager: Changing view acls to: abel,hdfs
16/04/16 21:41:05 INFO spark.SecurityManager: Changing modify acls to: abel,hdfs
16/04/16 21:41:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(abel, hdfs); users with modify permissions: Set(abel, hdfs)
16/04/16 21:41:06 INFO util.Utils: Successfully started service 'sparkDriver' on port 55162.
16/04/16 21:41:06 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/04/16 21:41:06 INFO Remoting: Starting remoting
16/04/16 21:41:07 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.106:55165]
16/04/16 21:41:07 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 55165.
16/04/16 21:41:07 INFO spark.SparkEnv: Registering MapOutputTracker
16/04/16 21:41:07 INFO spark.SparkEnv: Registering BlockManagerMaster
16/04/16 21:41:07 INFO storage.DiskBlockManager: Created local directory at /private/var/folders/wk/fxn2zdyd7rz8rm66rst4h15w0000gn/T/blockmgr-6de54d08-31c9-430e-ac3c-9f3e0635e486
16/04/16 21:41:07 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
16/04/16 21:41:07 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/04/16 21:41:07 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/16 21:41:07 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/04/16 21:41:07 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/04/16 21:41:07 INFO ui.SparkUI: Started SparkUI at http://192.168.1.106:4040
16/04/16 21:41:07 INFO executor.Executor: Starting executor ID driver on host localhost
16/04/16 21:41:07 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55167.
16/04/16 21:41:07 INFO netty.NettyBlockTransferService: Server created on 55167
16/04/16 21:41:07 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/04/16 21:41:07 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:55167 with 511.5 MB RAM, BlockManagerId(driver, localhost, 55167)
16/04/16 21:41:07 INFO storage.BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Python version 2.7.11 (default, Mar 1 2016 18:40:10)
SparkContext available as sc, HiveContext available as sqlContext.
>>>低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
CDH5之Found class jline.Terminal, but interface was expected
一.背景: 公司CDH5集群已经部署完毕,需要通过web界面添加hive组件,一般来说通过web界面来添加,会报两个错误, 一个是配置hive的元数据的/usr/share/java/mysql-connector-java.jar驱动包, 还有一个错误就是如图:二.错误明细: ++ exec /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli-1.1.0-cdh5.4.8.jar org.apache.hive.beeline.HiveSchemaTool -verbose -dbType mysql -initSchema [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was ex...
- 下一篇
Hadoop2.7实战v1.0之Hive-2.0.0+MySQL远程模式安装
环境:Apache Hadoop2.7分布式集群环境(HDFS HA,Yarn HA,HBase HA) 元数据库mysql部署在hadoop-01机器上 user:hive password:hive database:hive_remote_meta hive服务端部署在hadoop-01机器上 hive客户端部署在hadoop-02机器上 1.Install MySQL5.6.23 on hadoop-01 2.Create db and user hadoop-01:mysqladmin:/usr/local/mysql:>mysql -uroot -p mysql> create database hive_remote_meta; Query OK, 1 row affected (0.04 sec) mysql> create user 'hive' identified by 'hive'; Query OK, 0 rows affected (0.05 sec) mysql> grant all privileges on hive_remot...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- 设置Eclipse缩进为4个空格,增强代码规范
- CentOS6,CentOS7官方镜像安装Oracle11G
- SpringBoot2整合Thymeleaf,官方推荐html解决方案
- CentOS7编译安装Cmake3.16.3,解决mysql等软件编译问题
- SpringBoot2配置默认Tomcat设置,开启更多高级功能
- SpringBoot2全家桶,快速入门学习开发网站教程
- Docker快速安装Oracle11G,搭建oracle11g学习环境
- CentOS7,CentOS8安装Elasticsearch6.8.6
- CentOS8,CentOS7,CentOS6编译安装Redis5.0.7
- MySQL8.0.19开启GTID主从同步CentOS8