pyspark：寫與reduceByKey

聚集後的文件我的代碼看起來是這樣的：pyspark：寫與reduceByKey

sc = SparkContext("local", "App Name") 
eventRDD = sc.textFile("file:///home/cloudera/Desktop/python/event16.csv") 
outRDDExt = eventRDD.filter(lambda s: "Topic" in s).map(lambda s: s.split('|')) 
outRDDExt2 = outRDDExt.keyBy(lambda x: (x[1],x[2][:-19])) 
outRDDExt3 = outRDDExt2.mapValues(lambda x: 1) 
outRDDExt4 = outRDDExt3.reduceByKey(lambda x,y: x + y) 
outRDDExt4.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")

電流輸出文件看起來像這樣：（（u'Topic」，u'2017/05/08' ） 15）

我想在我的文件是這樣的：

u'Topic」，u'2017/05/08' ，15

如何得到上面的輸出（即擺脫元組等等從我目前的輸出？

來源

2017-05-28 KRey

您可以手動展開元組，並加入所有元素作爲字符串

outRDDExt4\ 
.map(lambda row : ",".join([row[0][1],row[0][1],str(row[1])])\ 
.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")

來源

2017-05-28 19:09:53 Pushkr

感謝。這工作。 – KRey

你能否接受答案，如果它的工作。 TKS – Pushkr

pyspark：寫與reduceByKey

回答

相關問題