2012-09-25 19 views
1

我是新來MRJob和MR和我在MRJob MR傳統字數Python的例子疑惑:MRJob MR分配到字典而不是產量?

from mrjob.job import MRJob 

class MRWordCounter(MRJob): 
    def mapper(self, key, line): 
     for word in line.split(): 
      yield word, 1 

    def reducer(self, word, occurrences): 
     yield word, sum(occurrences) 

if __name__ == '__main__': 
    MRWordCounter.run() 

是有可能的word, sum(occurrences)元組存儲到字典中產生,而不是他們,所以我可以稍後訪問它們?這將是什麼語法?謝謝!

回答

2

你可以簡單地使用列表,而不是收益率:

from mrjob.job import MRJob 

class MRWordCounter(MRJob): 
    def mapper(self, key, line): 
     results = [] 
     for word in line.split(): 
      results.append((word, 1)) <-- Note that the list should append a tuple here. 
     return results 

    def reducer(self, word, occurrences): 
     yield word, sum(occurrences) 

if __name__ == '__main__': 
    MRWordCounter.run() 
+0

感謝您指向元組! – Vor

0

請你已經得到了這份工作將另一臺服務器上運行的頭腦。輸入和輸出被視爲由運行模塊的腳本管理的問題。

如果要使用作業的輸出,則需要從您寫出的任何地方(默認爲標準輸出)讀取它,或者以編程方式運行作業。

這聽起來像你想要後者。在單獨的模塊中,您需要執行以下操作:

mr_job = MRWordCounter(args=['-r', 'emr']) 
with mr_job.make_runner() as runner: 
    runner.run() 
    for line in runner.stream_output(): 
     key, value = mr_job.parse_output_line(line) 
     ... # do something with the parsed output 

查閱文檔以瞭解更多詳細信息。上面的代碼示例取自: http://pythonhosted.org/mrjob/guides/runners.html#runners-programmatically