MRJob MR分配到字典而不是產量？

我是新來MRJob和MR和我在MRJob MR傳統字數Python的例子疑惑：MRJob MR分配到字典而不是產量？

from mrjob.job import MRJob 

class MRWordCounter(MRJob): 
    def mapper(self, key, line): 
     for word in line.split(): 
      yield word, 1 

    def reducer(self, word, occurrences): 
     yield word, sum(occurrences) 

if __name__ == '__main__': 
    MRWordCounter.run()

是有可能的word, sum(occurrences)元組存儲到字典中產生，而不是他們，所以我可以稍後訪問它們？這將是什麼語法？謝謝！

來源

2012-09-25 Michael

你可以簡單地使用列表，而不是收益率：

from mrjob.job import MRJob 

class MRWordCounter(MRJob): 
    def mapper(self, key, line): 
     results = [] 
     for word in line.split(): 
      results.append((word, 1)) <-- Note that the list should append a tuple here. 
     return results 

    def reducer(self, word, occurrences): 
     yield word, sum(occurrences) 

if __name__ == '__main__': 
    MRWordCounter.run()

來源

2012-12-13 07:38:08 MrROY

感謝您指向元組！ – Vor

請你已經得到了這份工作將另一臺服務器上運行的頭腦。輸入和輸出被視爲由運行模塊的腳本管理的問題。

如果要使用作業的輸出，則需要從您寫出的任何地方（默認爲標準輸出）讀取它，或者以編程方式運行作業。

這聽起來像你想要後者。在單獨的模塊中，您需要執行以下操作：

mr_job = MRWordCounter(args=['-r', 'emr']) 
with mr_job.make_runner() as runner: 
    runner.run() 
    for line in runner.stream_output(): 
     key, value = mr_job.parse_output_line(line) 
     ... # do something with the parsed output

查閱文檔以瞭解更多詳細信息。上面的代碼示例取自： http://pythonhosted.org/mrjob/guides/runners.html#runners-programmatically

來源

2013-07-09 21:16:56 thetainted1

MRJob MR分配到字典而不是產量？

回答

相關問題