0
提取4元組我有如下圖所示Pyspark從RDD
return [word_val+'&'+f_val+'&'+N_val+'&'+n_val+'&'+str(1)]
我要地圖這些值計算結果是包含五個元組RDD,我期待的映射喜歡的工作:
reducer_3 = add_m.map(lambda word: (word[0],word[1],word[2],word[3],1)).reduceByKey(lambda word[0],1: word[0]+1)
而且reducer_3應該返回一個包含RDD:
word[0] & summation_of_1's & word[1] & word[2] & word[3]
Im實際上是在語料庫上計算TF-IDF。我想要map函數從word [0]中取詞,並執行: reduceByKey(lambda word [0],1:word [0] +1) – Sameer
您能提供一個簡單的輸入和輸出示例嗎?爲了僅統計詞[0]的實例,你必須在reduceByKey之前丟棄其餘的元組, 'add_m.map(lambda word:(word [0],1))。reduceByKey(lambda x,y:x + y)' – RichD