我有以下數據,我想要做的是PySpark reduceByKey?添加鍵/元組
[(13, 'D'), (14, 'T'), (32, '6'), (45, 'T'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'T'), (53, '2'), (54, '0'), (13, 'A'), (14, 'T'), (32, '6'), (45, 'A'), (47, '2'), (48, '0'), (49, '2'), (50, '0'), (51, 'X')]
針對每一個琴鍵計值的情況下(1串字符)。所以,我首先做了一個地圖:
.map(lambda x: (x[0], [x[1], 1]))
使現在的關鍵/元組:
[(13, ['D', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['T', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['T', 1]), (53, ['2', 1]), (54, ['0', 1]), (13, ['A', 1]), (14, ['T', 1]), (32, ['6', 1]), (45, ['A', 1]), (47, ['2', 1]), (48, ['0', 1]), (49, ['2', 1]), (50, ['0', 1]), (51, ['X', 1])]
我只是不能在最後一部分搞清楚那封信如何爲每個鍵計數的情況下, 。例如鍵13將有1 d和1個A.雖然14將有2度T的等
你想第一個'groupByKey',然後在已分組的角色執行的計數。 – ohruunuruus