reduceByKey（）字典不按預期工作

我有一個函數，其中我從文本文件讀取數據並計劃創建每個鍵的所有值的字典。我有以下代碼片段來實現這一點：reduceByKey（）字典不按預期工作

def concat_helper(a,b): 
    if(a!= None and b!=None): 
     print a,b,a.update(b) 
     return dict(a).update(dict(b)) 
    elif a!=None: 
     return a 
    elif b!=None: 
     return b 

rdd_text=sc.textFile(inputdir) 
new_rdd = rdd_text.map(lambda x: (item_id_to_idx[str(x.strip().split("\t")[0])],{user_id_to_idx[str(x.strip().split("\t")[1])]:1})).reduceByKey(lambda x,y: concat_helper(x,y)).sortByKey()

rdd_text文件的形式的輸入線的RDD：

item_id  user_id  some other tab_separated fields.

item_id_to_idx和user_id_to_idx兩者都與用戶標識詞典來IDX（整數號碼）映射。

我想，最終創造出每個條目的字典的形式爲：

1，{5：1,10：1,34：1，...}，其中指數裏面的冒號左邊字典（鍵）代表user_indices，冒號右邊的數字代表它們的分數。

我想reduceByKey，但我得到很多None值。

我創建的幫助函數是爲了克服reduceByKey拋出的NoneType問題而創建的。

我收到的輸出是下面的形式：

(item_idx, {user_idx:1}) 
(item_idx2,None) 
(item_idx3,None) 
...

我懷疑無值是a.update在concat_helper方法（b）操作的結果。任何人都可以澄清發生了什麼/建議一個更好的更正確的方法來實現我想要實現的目標？

來源

2015-05-08 anonuser0428

你的問題是該行：

return dict(a).update(dict(b))

dict.update 總是返回無 - 它是單獨的副作用非常有用。

您應該改用：

a.update(b) 
return a

你可能想空的字典作爲輸出，如果A和B是無：

def concat_helper(a, b): 
    rval = a or {} 
    rval.update(b or {}) 
    return rval

來源

2015-05-11 12:31:50 nrg

reduceByKey（）字典不按預期工作

回答

相關問題