Big Data, MapReduce, Hadoop, and Spark with Python
此书不错,很短,且想打通PYTHON和大数据架构的关系。 先看一次,计划把这个文档作个翻译。 先来一个模拟MAPREDUCE的东东。。。 mapper.py class Mapper: def map(self, data): returnval = [] counts = {} for line in data: words = line.split() for w in words: counts[w] = counts.get(w, 0) + 1 for w, c in counts.iteritems(): returnval.append((w, c)) print "Mapper result:" print returnval return returnval reducer.py class Reducer: def reduce(self, d): returnval = [] for k, v in d.iteritems(): returnval.append("%s\t%s"%(k, sum(v))) print "Reducer result:" print...