2015-09-23 67 views
1

我當前正在嘗試學習mrjob以及如何在AWS EMR中實現它,所以請原諒我,如果我詢問已詢問的問題[搜索了很多地方但未找到答案]和抱歉,如果這是一個愚蠢的問題socket.gaierror嘗試使用python mrjob運行emr

這是我的Python腳本:

from mrjob.job import MRJob 

class MRWordFrequencyCount(MRJob): 

    def mapper(self, _, line): 
     yield "chars", len(line) 
     yield "words", len(line.split()) 
     yield "lines", 1 

    def reducer(self, key, values): 
     yield key, sum(values) 


if __name__ == '__main__': 
    MRWordFrequencyCount.run() 

當我在本地模式下運行它,我得到的結果

CMD:

蟒蛇sample.py input.txt中

所以,我想通過創建一個mrjob.conf文件

它看起來像這樣在EMR

運行此:

runners: 
    emr: 
    aws_access_key_id: 
    aws_secret_access_key: 
    aws_region: us-west-2a 
    ec2_key_pair: emr 
    ec2_key_pair_file: ~/Desktop/emr.pem 
    ec2_instance_type: m1.small 
    num_ec2_instances: 5 

    local: 
    base_tmp_dir: /tmp 

首次嘗試

在我的windows系統上本地試用

python check.py -r emr --conf-path ./mrjob.conf word.txt 

注:

時,我一直在S3中的位置輸入,並把它作爲一個參數

我得到這個追蹤同樣的錯誤來了:

Traceback (most recent call last): 
    File "check.py", line 16, in <module> 
    MRWordFrequencyCount.run() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\job.py", line 461, in run 
    mr_job.execute() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\job.py", line 479, in execute 
    super(MRJob, self).execute() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\launch.py", line 153, in execute 
    self.run_job() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\launch.py", line 215, in run_job 
    with self.make_runner() as runner: 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\job.py", line 502, in make_runner 
    return super(MRJob, self).make_runner() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\launch.py", line 168, in make_runner 
    return EMRJobRunner(**self.emr_job_runner_kwargs()) 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\emr.py", line 643, in __init__ 
    self._fix_s3_scratch_and_log_uri_opts() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts 
    self._set_s3_scratch_uri(s3_conn) 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\emr.py", line 776, in _set_s3_scratch_uri 
    buckets = s3_conn.get_all_buckets() 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\mrjob\retry.py", line 149, in call_and_maybe_retry 
    return f(*args, **kwargs) 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\boto\s3\connection.py", line 436, in get_all_buckets 
    response = self.make_request('GET', headers=headers) 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\boto\s3\connection.py", line 664, in make_request 
    retry_handler=retry_handler 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\boto\connection.py", line 1070, in make_request 
    retry_handler=retry_handler) 
    File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac 
kages\boto\connection.py", line 1029, in _mexe 
    raise ex 
socket.gaierror: [Errno 11004] getaddrinfo failed 

當我試圖在運行aws EC2實例

我得到這個錯誤

Traceback (most recent call last): 
    File "check.py", line 16, in <module> 
    MRWordFrequencyCount.run() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run 
    mr_job.execute() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute 
    super(MRJob, self).execute() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute 
    self.run_job() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 215, in run_job 
    with self.make_runner() as runner: 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 502, in make_runner 
    return super(MRJob, self).make_runner() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 168, in make_runner 
    return EMRJobRunner(**self.emr_job_runner_kwargs()) 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 643, in __init__ 
    self._fix_s3_scratch_and_log_uri_opts() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts 
    self._set_s3_scratch_uri(s3_conn) 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 776, in _set_s3_scratch_uri 
    buckets = s3_conn.get_all_buckets() 
    File "/usr/local/lib/python2.7/dist-packages/mrjob/retry.py", line 149, in call_and_maybe_retry 
    return f(*args, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 436, in get_all_buckets 
    response = self.make_request('GET', headers=headers) 
    File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 664, in make_request 
    retry_handler=retry_handler 
    File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request 
    retry_handler=retry_handler) 
    File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe 
    raise ex 
socket.gaierror: [Errno -2] Name or service not known 

我不知道我做錯了什麼

Python版本2.7版本mrjob「0.4.5」

回答

1

後搜索和調查的時間,我發現這個問題

正是在這條線

aws_region: us-west-2a 

在哪裏,因爲它本來應該

aws_region: us-west-2 

我只想保持這個問題的活力,因爲它可以節省其他人的時間