[Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子
[Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子 sqlContext = HiveContext(sc) peopleDF = sqlContext.read.json("people.json") peopleRDD = peopleDF.map(lambda row: (row.pcode,row.name)) peopleRDD.take(5) Out[5]: [(u'94304', u'Alice'), (u'94304', u'Brayden'), (u'10036', u'Carla'), (None, u'Diana'), (u'94104', u'Etienne')] peopleByPCode= peopleRDD.groupByKey() peopleByPCode.take(5) [(u'10036', <pyspark.resultiterable.ResultIterable at 0x7f0d683a2290>), (u'94104', <pyspark.resultiterable....