hadoop jython ( windows )-低调大师

hadoop jython ( windows )

2017-12-06 886

参考： hadoop window 搭建后,由于对 py 的语法喜欢，一直想把hadoop,改成jython 的
这次在自己电脑上终于完成,下面介绍过程:
测试环境：
依然的 windows + cygwin
hadoop 0.18 # C:/cygwin/home/lky/tools/java/hadoop-0.18.3
jython 2.2.1 # C:/jython2.2.1
参考: PythonWordCount
启动 hadoop 并到 hdoop_home 下

# 在云环境中创建 input 目录
$>bin/hadoop dfs -mkdir input
# 在包 hadoop 的 NOTICE.txt 拷贝到 input 目录下
$>bin/hadoop dfs -copyFromLocal c:/cygwin/home/lky/tools/java/hadoop-0.18.3/NOTICE.txt hdfs:///user/lky/input

$>cd src/examples/python

# 创建个脚本 ( jy->jar->hd run ) 一步完成!
# 当然在 linux 写个脚本比这好看呵呵！
$>vim run.bat

" C:\Program Files\Java\jdk1.6.0_11\bin\java.exe " -classpath " C:\jython2.2.1\jython.jar;%CLASSPATH% " org.python.util.jython C:\jython2 .2.1 \Tools\jythonc\jythonc.py -p org.apache.hadoop.examples -d -j wc.jar -c % 1

sh C:\cygwin\home\lky\tools\java\hadoop- 0.18.3 \bin\hadoop jar wc.jar % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9

# 修改 jythonc 打包环境。 +hadoop jar
$>vim C:\jython2.2.1\Tools\jythonc\jythonc.py

# Copyright (c) Corporation for National Research Initiatives
# Driver script for jythonc2. See module main.py for details
import sys,os,glob

for fn in glob.glob('c:/cygwin/home/lky/tools/java/hadoop-0.18.3/*.jar') :sys.path.append(fn)
for fn in glob.glob('c:/jython2.2.1/*.jar') :sys.path.append(fn)
for fn in glob.glob('c:/cygwin/home/lky/tools/java/hadoop-0.18.3/lib/*.jar' ) :sys.path.append(fn)

import main
main.main()

import os
os._exit(0)

# 运行
C:/cygwin/home/lky/tools/java/hadoop-0.18.3/src/examples/python>
run.bat WordCount.py hdfs:///user/lky/input file:///c:/cygwin/home/lky/tools/java/hadoop-0.18.3/tmp2

结果输出：
cat c:/cygwin/home/lky/tools/java/hadoop-0.18.3/tmp2/part-00000
(http://www.apache.org/).       1
Apache 1
Foundation      1
Software        1
The     1
This    1
by      1
developed       1
includes        1
product 1
software        1
下面重头来了：（简洁的 jy hdoop 代码）

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from org.apache.hadoop.fs import Path
from org.apache.hadoop.io import *
from org.apache.hadoop.mapred import *

import sys
import getopt

class WordCountMap(Mapper, MapReduceBase):
    one = IntWritable( 1 )
     def map(self, key, value, output, reporter):
         for w in value.toString().split():
            output.collect(Text(w), self.one)

class Summer(Reducer, MapReduceBase):
     def reduce(self, key, values, output, reporter):
        sum = 0
         while values.hasNext():
            sum += values.next().get()
        output.collect(key, IntWritable(sum))

def printUsage(code):
     print " wordcount [-m <maps>] [-r <reduces>] <input> <output> "
    sys.exit(code)

def main(args):
    conf = JobConf(WordCountMap);
    conf.setJobName( " wordcount " );

    conf.setOutputKeyClass(Text);
    conf.setOutputValueClass(IntWritable);

    conf.setMapperClass(WordCountMap);
    conf.setCombinerClass(Summer);
    conf.setReducerClass(Summer);
     try :
        flags, other_args = getopt.getopt(args[ 1 :], " m:r: " )
     except getopt.GetoptError:
        printUsage( 1 )
     if len(other_args) != 2 :
        printUsage( 1 )

     for f,v in flags:
         if f == " -m " :
            conf.setNumMapTasks(int(v))
         elif f == " -r " :
            conf.setNumReduceTasks(int(v))
    conf.setInputPath(Path(other_args[0]))
    conf.setOutputPath(Path(other_args[ 1 ]))
    JobClient.runJob(conf);

if __name__ == " __main__ " :
    main(sys.argv)

本文转自博客园刘凯毅的博客，原文链接：hadoop jython ( windows )，如需转载请自行联系原博主。

微信关注我们

原文链接：https://yq.aliyun.com/articles/361137

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

hadoop jython join ( 1 )

首先本文中的 hadoop join 在实际开发没有用处！如果在开发中请使用 cascading groupby, 进行 hadoop join, 本文只是为探讨弄懂 cascading 实现做准备。当然如果有有人 hadoop join 过请联系我，大家交流下！文件可能需要的一些参考： hadoop jython ( windows ) jython ，jython 编译以及jar 包少量 linux shell 本文介绍 hadoop 可能使用到的 join 接口测试，已经参考：使用Hadoop实现Inner Join操作的方法【from淘宝】：http://labs.chinamobile.com/groups/58_547 下面测试后，我这大体上对 hadoop join 的方式是这样理解的（猜想）：数据1 ; 数据2 job1 .map( 数据1 ) =（临时文件1）> 文件标示1+需要join列数据 job2 .map( 数据2 ) =（临时文件2）> 文件标示2+需要join列数据临时文件 mapred.join.exp...

2017-12-06

786

Java 代码： package com.xunjie.dmsp.olduser; import java.util.Properties; import cascading.flow.Flow; import cascading.flow.FlowConnector; import cascading.operation.regex.RegexSplitter; import cascading.pipe.Each; import cascading.pipe.Pipe; import cascading.scheme.TextLine; import cascading.tap.Hfs; import cascading.tap.Tap; import cascading.tuple.Fields; /** *test.txt： *1a *2b *3c * */data/hadoop/hadoop/bin/hadoopjar *dmsp_test_jar-1.0-SNAPSHOT-dependencies.jar *hdfs:/user/hadoop/test/lky/test.t...

2017-12-06

664

资源下载

更多资源

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长，特征是大鼻子、头戴帽子、身穿背带裤，还留着胡子。与他的双胞胎兄弟路易基一起，长年担任任天堂的招牌角色。

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称，一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集，帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。