火花減少和地圖問題

我在Spark做了一個小實驗，我遇到了麻煩。火花減少和地圖問題

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)] 


# TODO: Replace <FILL IN> with appropriate code 
from operator import add 
totalCount = (wordCounts 
       .map(lambda x: (x,1)) <==== something wrong with this line maybe 
       .reduce(sum))   <====omething wrong with this line maybe 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 

# TEST Mean using reduce (3b) 
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

來源

2015-06-07 BufBills

我想通了，我的解決方案：

from operator import add 
totalCount = (wordCounts 
       .map(lambda x: x[1]) 
       .reduce(add)) 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2)

來源

2015-06-07 18:47:52 BufBills

如果您解決了自己的問題，請將您的答案標記爲已接受。不要把這些信息放在問題中。 – Anthon

我自己並不確定，但從查看您的代碼我可以看到一些問題。 'map'函數不能與'list_name.map（some stuff）'列表一起使用，你需要像這樣調用map函數：'variable = map（function，arguments）'，如果你使用的是python 3，你需要做'variable = list（map（function，arguments））'。希望幫助有些:)

來源

2015-06-07 18:47:16 R21

另一個類似的方式：您還可以閱讀清單爲重點，值對，並使用distinct（）

from operator import add 
totalCount = (wordCounts 
      .map(lambda (k,v) : v) 
      .reduce(add)) 
average = totalCount/float(wordCounts.distinct().count()) 
print totalCount 
print round(average, 2)

來源

2016-07-27 07:04:06

火花減少和地圖問題

回答

相關問題