-1
>>> rdd.collect()
[1, 2, 3, 4]
定義兩個地圖的功能,我相信應該是相同的pyspark地圖函數返回不同
def map2(l):
result=[]
for i in l:
result.append(((i,i),1))
result.append((i,1))
for j in result:
yield j
輸出
>>>rdd.mapPartitions(map2).collect()
[((1, 1), 1), (1, 1), ((2, 2), 1), (2, 1), ((3, 3), 1), (3, 1), ((4, 4), 1), (4, 1)]
另一個功能
def map2(l):
result=[]
for i in l:
result.append(((i,i),1))
for i in l:
result.append((i,1))
for j in result:
yield j
輸出
>>> rdd.mapPartitions(map2).collect()
[((1, 1), 1), ((2, 2), 1), ((3, 3), 1), ((4, 4), 1)]
爲什麼他們應該是相同的? – khelwood