mapreduce中counter的使用
MapReduce Counter为提供我们一个窗口:观察MapReduce job运行期的各种细节数据。MapReduce自带了许多默认Counter。
Counter有"组group"的概念,用于表示逻辑上相同范围的所有数值。MapReduce job提供的默认Counter分为三个组
- Map-Reduce Frameword
Map input records,Map skipped records,Map input bytes,Map output records,Map output bytes,Combine input records,Combine output records,Reduce input records,Reduce input groups,Reduce output records,Reduce skipped groups,Reduce skipped records,Spilled records - File Systems
FileSystem bytes read,FileSystem bytes written - Job Counters
Launched map tasks,Launched reduce tasks,Failed map tasks,Failed reduce tasks,Data-local map tasks,Rack-local map tasks,Other local map tasks
-bash-4.1$ hadoop jar mr.jar com.catt.cdh.mr.CountRecords 13/11/29 11:38:04 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/11/29 11:38:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/11/29 11:38:11 INFO input.FileInputFormat: Total input paths to process : 1 13/11/29 11:38:11 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 13/11/29 11:38:11 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6298911ef75545c61859c08add6a74a83e0183ad] 13/11/29 11:38:12 INFO mapred.JobClient: Running job: job_201311251130_0208 13/11/29 11:38:13 INFO mapred.JobClient: map 0% reduce 0% 13/11/29 11:38:40 INFO mapred.JobClient: map 100% reduce 0% 13/11/29 11:38:49 INFO mapred.JobClient: map 100% reduce 100% 13/11/29 11:38:57 INFO mapred.JobClient: Job complete: job_201311251130_0208 13/11/29 11:38:57 INFO mapred.JobClient: Counters: 32 13/11/29 11:38:57 INFO mapred.JobClient: File System Counters 13/11/29 11:38:57 INFO mapred.JobClient: FILE: Number of bytes read=36 13/11/29 11:38:57 INFO mapred.JobClient: FILE: Number of bytes written=322478 13/11/29 11:38:57 INFO mapred.JobClient: FILE: Number of read operations=0 13/11/29 11:38:57 INFO mapred.JobClient: FILE: Number of large read operations=0 13/11/29 11:38:57 INFO mapred.JobClient: FILE: Number of write operations=0 13/11/29 11:38:57 INFO mapred.JobClient: HDFS: Number of bytes read=139 13/11/29 11:38:57 INFO mapred.JobClient: HDFS: Number of bytes written=7 13/11/29 11:38:57 INFO mapred.JobClient: HDFS: Number of read operations=2 13/11/29 11:38:57 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/11/29 11:38:57 INFO mapred.JobClient: HDFS: Number of write operations=1 13/11/29 11:38:57 INFO mapred.JobClient: Job Counters 13/11/29 11:38:57 INFO mapred.JobClient: Launched map tasks=1 13/11/29 11:38:57 INFO mapred.JobClient: Launched reduce tasks=1 13/11/29 11:38:57 INFO mapred.JobClient: Data-local map tasks=1 13/11/29 11:38:57 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=31068 13/11/29 11:38:57 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=6671 13/11/29 11:38:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/11/29 11:38:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/11/29 11:38:57 INFO mapred.JobClient: Map-Reduce Framework 13/11/29 11:38:57 INFO mapred.JobClient: Map input records=13 13/11/29 11:38:57 INFO mapred.JobClient: Map output records=1 13/11/29 11:38:57 INFO mapred.JobClient: Map output bytes=14 13/11/29 11:38:57 INFO mapred.JobClient: Input split bytes=103 13/11/29 11:38:57 INFO mapred.JobClient: Combine input records=0 13/11/29 11:38:57 INFO mapred.JobClient: Combine output records=0 13/11/29 11:38:57 INFO mapred.JobClient: Reduce input groups=1 13/11/29 11:38:57 INFO mapred.JobClient: Reduce shuffle bytes=32 13/11/29 11:38:57 INFO mapred.JobClient: Reduce input records=1 13/11/29 11:38:57 INFO mapred.JobClient: Reduce output records=1 13/11/29 11:38:57 INFO mapred.JobClient: Spilled Records=2 13/11/29 11:38:57 INFO mapred.JobClient: CPU time spent (ms)=4780 13/11/29 11:38:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=657629184 13/11/29 11:38:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3802001408 13/11/29 11:38:57 INFO mapred.JobClient: Total committed heap usage (bytes)=1915486208 13/11/29 11:38:57 INFO mr.CountRecords: sum 13
使用Java Enum自定义Counter
一个Counter可以是任意的Enum类,见如下代码示例:
import java.io.IOException; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Counter; import org.apache.hadoop.mapreduce.Counters; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; /* * 使用Java Enum自定义Counter * 一个Counter可以是任意的Enum类型。 * 比如有个文件每行记录了用户的每次上网时长,统计上网时间超过30分钟的次数,小于或等于30分钟的次数 * 可以使用下面的代码。最后的计数结果会显示在终端上 */ public class CounterTest extends Configured implements Tool { private final static Log log = LogFactory.getLog(CounterTest.class); public static void main(String[] args) throws Exception { String[] ars = new String[] { "hdfs://data2.kt:8020/test/input", "hdfs://data2.kt:8020/test/output" }; int exitcode = ToolRunner.run(new CounterTest(), ars); System.exit(exitcode); } public int run(String[] args) throws Exception { Configuration conf = getConf(); conf.set("fs.default.name", "hdfs://data2.kt:8020/"); FileSystem fs = FileSystem.get(conf); fs.delete(new Path(args[1]), true); Job job = new Job(); job.setJarByClass(CounterTest.class); job.setMapperClass(MyMap.class); job.setNumReduceTasks(0); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); int result = job.waitForCompletion(true) ? 0 : 1; //针对Counter结果的显示 Counters counters = job.getCounters(); Counter counter1=counters.findCounter(NetTimeLong.OVER30M); log.info(counter1.getValue()); log.info(counter1.getDisplayName()+","+counter1.getName()); return result; } public static class MyMap extends Mapper<LongWritable, Text, NullWritable, Text> { private Counter counter1, counter2; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { double temperature = Double.parseDouble(value.toString()); if (temperature <= 30) { // get时如果不存在就会自动添加 counter2 = context.getCounter(NetTimeLong.LOW30M); counter2.increment(1); } else if (temperature > 30) { counter1 = context.getCounter(NetTimeLong.OVER30M); counter1.increment(1); } context.write(NullWritable.get(), value); } } } enum NetTimeLong { OVER30M, LOW30M }

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
HDFS 的Trash回收站功能的配置、使用
文件的删除和恢复 和Linux系统的回收站设计一样,HDFS会为每一个用户创建一个回收站目录:/user/用户名/.Trash/,每一个被用户通过Shell删除的文件/目录,在系统回收站中都一个周期,也就是当系统回收站中的文件/目录在一段时间之后没有被用户回复的话,HDFS就会自动的把这个文件/目录彻底删除,之后,用户就永远也找不回这个文件/目录了。在HDFS内部的具体实现就是在NameNode中开启了一个后台线程Emptier,这个线程专门管理和监控系统回收站下面的所有文件/目录,对于已经超过生命周期的文件/目录,这个线程就会自动的删除它们,不过这个管理的粒度很大。另外,用户也可以手动清空回收站,清空回收站的操作和删除普通的文件目录是一样的,只不过HDFS会自动检测这个文件目录是不是回收站,如果是,HDFS当然不会再把它放入用户的回收站中了 根据上面的介绍,用户通过命令行即HDFS的shell命令删除某个文件,这个文件并没有立刻从HDFS中删除。相反,HDFS将这个文件重命名,并转移到操作用户的回收站目录中(如/user/hdfs/.Trash/Current, 其中hdfs...
- 下一篇
如何在eclipse调试storm程序
一、介绍 storm提供了两种运行模式:本地模式和分布式模式。本地模式针对开发调试storm topologies非常有用。 Storm has two modes of operation: local mode and distributed mode. In local mode, Storm executes completely in process by simulating worker nodes with threads. Local mode is useful for testing and development of topologies 因为多数程序开发者都是使用windows系统进行程序开发,如果在本机不安装storm环境的情况下,开发、调试storm程序。如果你正在为此问题而烦恼,请使用本文提供的方法。 二、实施步骤 如何基于eclipse+maven调试storm程序,步骤如下: 1.搭建好开发环境(eclipse+maven,本人使用的是eclipseKepler 与maven3.1.1) 2.创建maven项目,并...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- SpringBoot2整合MyBatis,连接MySql数据库做增删改查操作
- Windows10,CentOS7,CentOS8安装Nodejs环境
- CentOS8安装MyCat,轻松搞定数据库的读写分离、垂直分库、水平分库
- Red5直播服务器,属于Java语言的直播服务器
- SpringBoot2整合Thymeleaf,官方推荐html解决方案
- Jdk安装(Linux,MacOS,Windows),包含三大操作系统的最全安装
- CentOS7设置SWAP分区,小内存服务器的救世主
- SpringBoot2初体验,简单认识spring boot2并且搭建基础工程
- SpringBoot2全家桶,快速入门学习开发网站教程
- Docker安装Oracle12C,快速搭建Oracle学习环境