2015-06-12 200 views
2

任何人都可以將此非常簡單的scala代碼轉換爲python嗎?將Apache Spark Scala代碼轉換爲Python

val words = Array("one", "two", "two", "three", "three", "three") 
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1)) 

val wordCountsWithGroup = wordPairsRDD 
    .groupByKey() 
    .map(t => (t._1, t._2.sum)) 
    .collect() 
+1

你認爲代碼的輸出是什麼?我猜想代碼是在計算單詞出現次數,對嗎?那麼預期結果{「one」:1,「two」:2,「three」:3}? – eugenioy

+1

'import collections; words = [「one」,「two」,「two」,「three」,「three」,「three」]; collections.Counter(words)''if''「one」:1 ,「two」:2,「three」:3}'就是你想要的。 –

+0

是的,我期待這樣的輸出:[('one',1),('two',3),('three',3)]。 .map(t =>(t._1,t._2.sum))行的Python代碼是什麼? – muktadiur

回答

4

試試這個:

words = ["one", "two", "two", "three", "three", "three"] 
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1)) 

wordCountsWithGroup = wordPairsRDD 
    .groupByKey() 
    .map(lambda t: (t[0], sum(t[1]))) 
    .collect() 
2

兩個翻譯在python:

from operator import add 
wordsList = ["one", "two", "two", "three", "three", "three"] 
words = sc.parallelize(wordsList).map(lambda l :(l,1)).reduceByKey(add).collect() 
print words 
words = sc.parallelize(wordsList).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect() 
print words 
2

假設你已經有一個Spark上下文中定義,並準備去:

from operator import add 
words = ["one", "two", "two", "three", "three", "three"] 
wordsPairRDD = sc.parallelize(words).map(lambda word: (word, 1)) 
     .reduceByKey(add) 
     .collect() 

結帳github示例回購:Python Examples