我是新來hadoop框架和地圖減少抽象。找到最小數量的hadoop streaming python
基本上,我想找到一個巨大的文本文件中的最小號(分隔 「」)
所以,這裏是我的代碼 mapper.py
#!/usr/bin/env python
import sys
# input comes from STDIN (standard input)
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
numbers = line.split(",")
# increase counters
for number in numbers:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
print '%s\t%s' % (number, 1)
減速
#!/usr/bin/env python
from operator import itemgetter
import sys
smallest_number = sys.float_info.max
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
number, count = line.split('\t', 1)
try:
number = float(number)
except ValueError:
continue
if number < smallest_number:
smallest_number = number
print smallest_number <---- i think the error is here... there is no key value thingy
print smallest_number
我收到的錯誤:
12/10/04 12:07:22 ERROR streaming.StreamJob: Job not successful. Error: NA
12/10/04 12:07:22 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
你會得到什麼樣的結果?有什麼問題?你在談論什麼「關鍵價值」? – Junuxx
@Junuxx:嗨..我剛剛發佈了錯誤..基本上..如何將地圖減少在文本文件中查找最小數量的抽象看起來像?/ 我說的錯誤是.. mapper給出(數字,1)與字數統計示例中的映射器基本相同的格式。 在減速機中,我所關心的是數字..我把這個數字與當前最小的數字進行比較,然後進行交換? – Fraz
在沒有Hadoop的情況下進行調試可能會有幫助:'cat input | ./mapper.py |排序| 。/ reducer.py'這是否成功運行? –