我在本地運行地圖縮小。Map Reduce:爲什麼需要指定「python」之前管道到.py文件?
我的命令行命令如下:
cat testfile | python ./mapper.py | python ./reducer.py
,這工作得很好。然而,當我的命令如下:
cat testfile | ./mapper.py | ./reducer.py
我收到以下錯誤:
./mapper.py: line 1: import: command not found
./mapper.py: line 3: syntax error near unexpected token `('
./mapper.py: line 3: `def mapper():
這是有道理的,因爲在命令行正在讀我的Python文件作爲bash和由Python的語法感到困惑。
但我看到的所有在線示例(例如http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/)都不包括.py文件之前的python
。如何在不指定之前配置我的機器以運行管道mapper.py和reducer.py?
萬一有幫助,這是我的映射器代碼:
import sys
def mapper():
for line in sys.stdin:
data = line.strip().split('\t')
if len(data) == 6:
category = data[3]
sales = data[4]
print '{0}\t{1}'.format(category, sales)
if __name__ == "__main__":
mapper()
這裏是我的減速器代碼:
import sys
def reducer():
current_total = 0
old_key = None
for line in sys.stdin:
data = line.strip().split('\t')
if len(data) == 2:
current_key, sales = data
sales = float(sales)
if old_key and current_key != old_key:
print "{0}\t{1}".format(old_key, current_total)
current_total = 0
old_key = current_key
current_total += sales
print "{0}\t{1}".format(current_key, current_total)
if __name__ == "__main__":
reducer()
我的數據是這樣的:
2012-01-01 09:01 Anchorage DVDs 6.38 Amex
2012-01-01 09:01 Aurora Electronics 117.81 MasterCard
2012-01-01 09:01 Philadelphia DVDs 351.31 Cash
你的Python腳本的開頭'#添加hashbang行的/ usr/bin中/ env的python' –
附加家當並設置執行ATTRIB'使用chmod + X script.py' – furas