MapReduce编程实例之自定义分区
任务描述:
一组数据,按照年份的不同将其分别存放在不同的文件里
example Data:
2013 1
2013 5
2014 5
2014 8
2015 9
2015 4
Code:
package mrTest; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class zidingyiPartition { public static class myPartition extends Partitioner<LongWritable, LongWritable>{ public int getPartition(LongWritable key, LongWritable value, int numTaskReduces) { // TODO Auto-generated method stub if(key.get()==2013){ return 0; }else if(key.get()==2014){ return 1; }else{ return 2; } } } public static class Map extends Mapper<Object, Text, LongWritable,LongWritable>{ public void map(Object key, Text value, Context context) throws IOException, InterruptedException{ String[] line = value.toString().split("\t"); context.write( new LongWritable(Integer.parseInt(line[0])) , new LongWritable(Integer.parseInt(line[1])) ); } } public static class Reduce extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{ public void reduce(LongWritable key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException{ for (LongWritable longWritable : values) { context.write(key, longWritable); } } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { // TODO Auto-generated method stub Job job = new Job(); job.setJarByClass(zidingyiPartition.class); // 1 FileInputFormat.addInputPath(job, new Path(args[0])); // 2 job.setMapperClass(Map.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(LongWritable.class); // 3 job.setPartitionerClass(myPartition.class); // 4 // 5 job.setNumReduceTasks(3); // 6 job.setReducerClass(Reduce.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(LongWritable.class); // 7 FileOutputFormat.setOutputPath(job, new Path(args[1])); // 8 System.exit(job.waitForCompletion(true)? 0 : 1); } }结果展示:

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
Hadoop-2.7.2分布式安装手册
Hadoop-2.7.2分布式安装手册.pdf目录 目录 1 1.前言 3 2.特性介绍 3 3.部署 5 3.1.机器列表 5 3.2.主机名 5 3.2.1.临时修改主机名 6 3.2.2.永久修改主机名 6 3.3.免密码登录范围 7 4.约定 7 4.1.安装目录约定 7 4.2.服务端口约定 8 4.3.各模块RPC和HTTP端口 9 5.工作详单 9 6.JDK安装 9 6.1.下载安装包 9 6.2.安装步骤 10 7.免密码ssh2登录 10 8.Hadoop安装和配置 11 8.1.下载安装包 11 8.2.安装和环境变量配置 12 8.3.修改hadoop-env.sh 12 8.4.修改/etc/hosts 13 8.5.修改slaves 14 8.6.准备好各配置文件 14 8.7.修改hdfs-site.xml 14 8.8.修改core-site.xml 16 8.8.1.dfs.namenode.rpc-address 17 8.9.修改mapred-site.xml 17 8.10.修改yarn-site.xml 17 9.启动顺序 19 10.启动HD...
- 下一篇
MapReduce编程实例之自定义排序
任务描述: 给出一组数据,自定义排序的样式,第一列降序,相同时第二列升序 example Data: 2013 1 2013 5 2014 5 2014 8 2015 9 2015 4 Code: package mrTest; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.h...
相关文章
文章评论
共有0条评论来说两句吧...