c#扩展出MapReduce方法-低调大师

c#扩展出MapReduce方法

2015-01-29 710

MapReduce方法主体：

 1 public static IDictionary<TKey, TResult> MapReduce<TInput, TKey, TValue, TResult>(this IList<TInput> inputList,
 2             Func<MapReduceData<TInput>, KeyValueClass<TKey, TValue>> map, Func<TKey, IList<TValue>, TResult> reduce)
 3         {
 4             object locker = new object();
 5             ConcurrentDictionary<TKey, TResult> result = new ConcurrentDictionary<TKey, TResult>();
 6             //保存map出来的结果
 7             ConcurrentDictionary<TKey, IList<TValue>> mapDic = new ConcurrentDictionary<TKey, IList<TValue>>();
 8             var parallelOptions = new ParallelOptions();
 9             parallelOptions.MaxDegreeOfParallelism = Environment.ProcessorCount;
10             //并行map
11             Parallel.For(0, inputList.Count(), parallelOptions, t =>
12             {
13                 MapReduceData<TInput> data = new MapReduceData<TInput>
14                 {
15                     Data = inputList[t],
16                     Index = t,
17                     List = inputList,
18                 };
19                 var pair = map(data);
20                 if (pair != null && pair.Valid)
21                 {
22                     //锁住防止并发操作list造成数据缺失
23                     lock (locker)
24                     {
25                         //将匹配出来的结果加入结果集放入字典
26                         IList<TValue> list = null;
27                         if (mapDic.ContainsKey(pair.Key))
28                         {
29                             list = mapDic[pair.Key];
30                         }
31                         else
32                         {
33                             list = new List<TValue>();
34                             mapDic[pair.Key] = list;
35                         }
36                         list.Add(pair.Value);
37                     }
38                 }
39             });
40 
41             //并行reduce
42             Parallel.For(0, mapDic.Keys.Count, parallelOptions, t =>
43             {
44                 KeyValuePair<TKey, IList<TValue>> pair = mapDic.ElementAt(t);
45                 result[pair.Key] = reduce(pair.Key, pair.Value);
46             });
47             return result;
48         }

View Code

KeyValueClass定义：

 1 public class KeyValueClass<K, V>
 2     {
 3         public KeyValueClass(K key, V value)
 4         {
 5             Key = key;
 6             Value = value;
 7         }
 8 
 9         public KeyValueClass()
10         {
11 
12         }
13 
14         public K Key { get; set; }
15 
16         public V Value { get; set; }
17     }

View Code

Console测试：

 1 List<TestClass> listTestClass = new List<TestClass>();
 2             listTestClass.Add(new TestClass { a = "a", g = 1 });
 3             listTestClass.Add(new TestClass { a = "b", g = 3 });
 4             listTestClass.Add(new TestClass { a = "c", g = 4 });
 5             listTestClass.Add(new TestClass { a = "d", g = 2 });
 6             listTestClass.Add(new TestClass { a = "e", g = 1 });
 7             listTestClass.Add(new TestClass { a = "f", g = 2 });
 8             listTestClass.Add(new TestClass { a = "g", g = 5 });
 9             listTestClass.Add(new TestClass { a = "h", g = 6 });
10             IDictionary<int, string> dic = listTestClass.MapReduce(t =>
11             {
12                 if (t.g < 5)
13                 {
14                     return new KeyValueClass<int, string>(t.g, t.a);
15                 }
16                 return null;
17             }, (key, values) =>
18            {
19                return string.Join(",", values);
20            });

View Code

TestClass定义：

 1 public class TestClass
 2     {
 3         public string a { get; set; }
 4         public string b { get; set; }
 5 
 6         public string d { get; set; }
 7 
 8         //public DateTime f { get; set; }
 9 
10         public int g { get; set; }
11 
12         public List<TestClass> test { get; set; }
13 
14         public Dictionary<string, string> dic { get; set; }
15     }

View Code

结果：

1：a,e

2：d,f

3：b

4：c

词频性能测试

微信关注我们

原文链接：https://yq.aliyun.com/articles/593047

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

MapReduce的模式，算法以及用例

本文译自Mapreduce Patterns, Algorithms, and Use Cases 在这篇文章里总结了几种网上或者论文中常见的MapReduce模式和算法，并系统化的解释了这些技术的不同之处。所有描述性的文字和代码都使用了标准hadoop的MapReduce模型，包括Mappers, Reduces, Combiners, Partitioners,和 sorting。如下图所示。基本MapReduce模式计数与求和问题陈述: 有许多文档，每个文档都有一些字段组成。需要计算出每个字段在所有文档中的出现次数或者这些字段的其他什么统计值。例如，给定一个log文件，其中的每条记录都包含一个响应时间，需要计算出平均响应时间。解决方案: 让我们先从简单的例子入手。在下面的代码片段里，Mapper每遇到指定词就把频次记1，Reducer一个个遍历这些词的集合然后把他们的频次加和。 1 class Mapper 2 method Map(docid id, doc d) 3 for all term t in doc d do 4 Emit(term t, count 1 ...

2015-01-28

696

前段时间公司hadoop集群宕机，发现是namenode磁盘满了, 清理出部分空间后，重启集群时，重启失败。又发现集群Secondary namenode 服务也恰恰坏掉，导致所有的操作log持续写入edits.new 文件，等集群宕机的时候文件大小已经达到了丧心病狂的70G+..重启集群报错加载edits文件失败。分析加载文件报错原因是磁盘不足导致最后写入的log只写入一半就宕机了。由于log不完整，hadoop再次启动加载edits文件时读取文件报错。由于edits.new 文件过大，存储了好多操作log，所以必须要对其进行修复。尝试删除文件的最后几行，结果还是报错。于是查看源码对edits 文件结构进行分析发现是二进制格式，首行为版本号，然后是hadoop运行过程中的log记录内容，由操作码 +长度(非必须)+其他项组成。 edits文件格式分析图解决办法报错位置在源码中的方法为org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(EditLogInputStreamedits)方法中读取文件最后位置时...

2015-01-31

686

资源下载

更多资源

腾讯云软件源

为解决软件依赖安装时官方源访问速度慢的问题，腾讯云为一些软件搭建了缓存服务。您可以通过使用腾讯云软件源站来提升依赖包的安装速度。为了方便用户自由搭建服务架构，目前腾讯云软件源站支持公网访问和内网访问。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。