將Apache Spark Scala代碼轉換爲Python

任何人都可以將此非常簡單的scala代碼轉換爲python嗎？將Apache Spark Scala代碼轉換爲Python

val words = Array("one", "two", "two", "three", "three", "three") 
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1)) 

val wordCountsWithGroup = wordPairsRDD 
    .groupByKey() 
    .map(t => (t._1, t._2.sum)) 
    .collect()

來源

2015-06-12 muktadiur

你認爲代碼的輸出是什麼？我猜想代碼是在計算單詞出現次數，對嗎？那麼預期結果{「one」：1，「two」：2，「three」：3}？ – eugenioy

'import collections; words = [「one」，「two」，「two」，「three」，「three」，「three」]; collections.Counter（words）''if''「one」：1 ，「two」：2，「three」：3}'就是你想要的。 –

是的，我期待這樣的輸出：[（'one'，1），（'two'，3），（'three'，3）]。 .map（t =>（t._1，t._2.sum））行的Python代碼是什麼？ – muktadiur

試試這個：

words = ["one", "two", "two", "three", "three", "three"] 
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1)) 

wordCountsWithGroup = wordPairsRDD 
    .groupByKey() 
    .map(lambda t: (t[0], sum(t[1]))) 
    .collect()

來源

2015-06-12 20:49:40

兩個翻譯在python：

from operator import add 
wordsList = ["one", "two", "two", "three", "three", "three"] 
words = sc.parallelize(wordsList).map(lambda l :(l,1)).reduceByKey(add).collect() 
print words 
words = sc.parallelize(wordsList).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect() 
print words

來源

2015-06-12 20:53:52 Junayy

假設你已經有一個Spark上下文中定義，並準備去：

from operator import add 
words = ["one", "two", "two", "three", "three", "three"] 
wordsPairRDD = sc.parallelize(words).map(lambda word: (word, 1)) 
     .reduceByKey(add) 
     .collect()

結帳github示例回購：Python Examples

來源

2015-06-12 20:58:05

將Apache Spark Scala代碼轉換爲Python

回答

相關問題