2015-06-07 87 views
1

我在Spark做了一個小實驗,我遇到了麻煩。火花減少和地圖問題

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)] 


# TODO: Replace <FILL IN> with appropriate code 
from operator import add 
totalCount = (wordCounts 
       .map(lambda x: (x,1)) <==== something wrong with this line maybe 
       .reduce(sum))   <====omething wrong with this line maybe 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 

# TEST Mean using reduce (3b) 
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average') 

回答

2

我想通了,我的解決方案:

from operator import add 
totalCount = (wordCounts 
       .map(lambda x: x[1]) 
       .reduce(add)) 
average = totalCount/float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count()) 
print totalCount 
print round(average, 2) 
+2

如果您解決了自己的問題,請將您的答案標記爲已接受。不要把這些信息放在問題中。 – Anthon

1

我自己並不確定,但從查看您的代碼我可以看到一些問題。 'map'函數不能與'list_name.map(some stuff)'列表一起使用,你需要像這樣調用map函數:'variable = map(function,arguments)',如果你使用的是python 3,你需要做'variable = list(map(function,arguments))'。 希望幫助有些:)

0

另一個類似的方式: 您還可以閱讀清單爲重點,值對,並使用distinct()

from operator import add 
totalCount = (wordCounts 
      .map(lambda (k,v) : v) 
      .reduce(add)) 
average = totalCount/float(wordCounts.distinct().count()) 
print totalCount 
print round(average, 2)