hadoop-1.x的运行实例
我的环境是hadoop-0.20.2,eclipse:SDK-3.3.2,
源数据为:
Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84 Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84 Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84 Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84 Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84 Apr 23 11:49:54 hostapd: wlan0: STA 14:7d:c5:9e:fb:84
想要获取的数据是:
Apr 23 14:7d:c5:9e:fb:84 Apr 23 14:7d:c5:9e:fb:84 Apr 23 14:7d:c5:9e:fb:84 Apr 23 14:7d:c5:9e:fb:84 Apr 23 14:7d:c5:9e:fb:84 Apr 23 14:7d:c5:9e:fb:84
运行时输入的参数是:
hdfs的输入和输出目录:即 hdfs://cMaster:/user/joe/in hdfs://cMaster:/user/joe/out
源代码:
package hadoop; import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.io.*; import org.apache.hadoop.util.*; public class test extends Configured implements Tool{ enum Counter{ LINESKIP, } public static class Map extends Mapper<LongWritable,Text,NullWritable,Text>{ public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException{ String line=value.toString(); try{ String [] lineSplit=line.split(" "); String month=lineSplit[0]; String time=lineSplit[1]; String mac=lineSplit[6]; Text out=new Text(month+' '+time+' '+mac); context.write(NullWritable.get(),out); }catch(java.lang.ArrayIndexOutOfBoundsException e){ context.getCounter(Counter.LINESKIP).increment(1); return; } } } public int run(String[] args)throws Exception{ Configuration conf=getConf(); Job job=new Job(conf,"test"); job.setJarByClass(test.class); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.setMapperClass(Map.class); job.setOutputFormatClass(TextOutputFormat.class); job.setOutputKeyClass(NullWritable.class); job.waitForCompletion(true); return job.isSuccessful()?0:1; } public static void main(String[] args)throws Exception{ int res=ToolRunner.run(new Configuration(),new test(),args); System.exit(res); } }
当神已无能为力,那便是魔渡众生
低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
hadoop学习笔记--集群搭建
注:大家常说的ssh其实就是一个免密码访问的东西,为了简化操作的,不用每次访问其他节点重新输入密码。但是要想配置如下: 1.在每台机器上执行 ssh-keygen -t rsa,连敲三次回车键(即设置空密码) 2.然后在每台机器上都执行cd ~/.ssh,并分别把id_rsa.pub复制到authorized_keys中, 即执行 cp id_rsa.pub authorized_keys 3.然后分别把slave0,slave1的authorized_keys都复制到master主节点的authorized_keys中, 即分别在两个从节点slave0和slave1中执行 ssh-copy-id -i master 4.再分别复制到slave0,slave1中(即每一个节点的authorized_keys中都有这三个节点的密钥) 即在主节点master上执行 scp -r ~/.ssh/authorized_keys slave0:~/.ssh/ scp -r ~/.ssh/authorized_keys slave1:~/.ssh/ 此时成功。 简单测试:比如说在master上执行...
- 下一篇
Caused by: java.io.IOException: Filesystem closed的处理
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://nameservice/user/hive/warehouse/om_dw.db/mac_wifi_day_data/tid=CYJOY/.hive-staging_hive_2016-01-20_10-19-09_200_1283758166994658237-1/_task_tmp.-ext-10002/c_date=2014-10-06/_tmp.000151_0 to: hdfs://nameservice/user/hive/warehouse/om_dw.db/mac_wifi_day_data/tid=CYJOY/.hive-staging_hive_2016-01-20_10-19-09_200_1283758166994658237-1/_tmp.-ext-10002/c_date=2014-10-06/000151_0 at org.apache.hadoop.hive.ql.exec.File...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- Docker使用Oracle官方镜像安装(12C,18C,19C)
- CentOS8编译安装MySQL8.0.19
- CentOS8,CentOS7,CentOS6编译安装Redis5.0.7
- SpringBoot2整合MyBatis,连接MySql数据库做增删改查操作
- SpringBoot2整合Redis,开启缓存,提高访问速度
- SpringBoot2配置默认Tomcat设置,开启更多高级功能
- Hadoop3单机部署,实现最简伪集群
- CentOS7,CentOS8安装Elasticsearch6.8.6
- CentOS6,7,8上安装Nginx,支持https2.0的开启
- SpringBoot2编写第一个Controller,响应你的http请求并返回结果