hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统
第一步:定义Map过程
-
-
-
-
-
-
-
-
- public class MyMap extends Mapper<Object, Text, Text, IntWritable> {
-
- private static final IntWritable one = new IntWritable(1);
- private Text word;
-
-
- public void map(Object key ,Text value,Context context)
- throws IOException,InterruptedException{
-
- String line=value.toString();
- StringTokenizer tokenizer = new StringTokenizer(line);
- while(tokenizer.hasMoreTokens()){
- word = new Text();
- word.set(tokenizer.nextToken());
- context.write(word, one);
- }
-
- }
-
- }
第二步: 定义 Reduce 过程
-
-
-
-
-
-
-
-
- public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
-
- public void reduce (Text key,Iterable<IntWritable> values,Context context)
- throws IOException ,InterruptedException{
-
- int sum=0;
- for(IntWritable val: values){
- sum+=val.get();
- }
-
- context.write(key, new IntWritable(sum));
- }
-
- }
编写一个Driver 来执行Map-Reduce过程
- public class MyDriver {
-
- public static void main(String [] args) throws Exception{
-
- Configuration conf = new Configuration();
- conf.set("hadoop.job.ugi", "root,root123");
-
- Job job = new Job(conf,"Hello,hadoop! ^_^");
-
- job.setJarByClass(MyDriver.class);
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- job.setMapperClass(MyMap.class);
- job.setCombinerClass(MyReduce.class);
- job.setReducerClass(MyReduce.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TextOutputFormat.class);
-
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job,new Path(args[1]));
-
- job.waitForCompletion(true);
- }
- }
本文转自 charles_wang888 51CTO博客,原文链接:http://blog.51cto.com/supercharles888/840723,如需转载请自行联系原作者