2015-04-19 66 views
0

我試圖使用python MRJob 這裏是簡單的Python腳本找到一個txt文件mrjob InstanceProfile需要創建集羣

from mrjob.job import MRJob 

class MRWordFrequencyCount(MRJob): 
    def mapper(self, _, line): 
     yield "chars", len(line) 
     yield "words", len(line.split()) 
     yield "lines", 1 

    def reducer(self, key, values): 
     yield key, sum(values) 


if __name__ == '__main__': 
MRWordFrequencyCount.run() 

這是最常用的詞來運行在Amazon EC2實例一我mrjob.conf文件:

runners: 
    emr: 
    aws_access_key_id: XXXXXXXXXXXXXXXXXX 
    aws_secret_access_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
    aws_region: us-west-1 
    ec2_key_pair: EMR 
    ec2_key_pair_file: ~/EMR.pem # ~/ and $ENV_VARS allowed here 
    ssh_tunnel_to_job_tracker: true 

當我運行該腳本:

python MRMostUsedWord.py -r emr romeo.txt > most_used_word.out 

我得到以下錯誤:

<Error> 
<Type>Sender</Type> 
<Code>ValidationError</Code> 
<Message>InstanceProfile is required for creating cluster</Message> 
</Error> 
<RequestId>4d1a1e3b-e665-11e4-b9e1-a557982e1081</RequestId> 
</ErrorResponse> 

你知道爲什麼我得到這個錯誤嗎?

aws emr create-default-roles 

也許mrjob.conf文件需要修改:

我使用該命令還創建實例配置文件?但我不知道如何?

回答

0

如果您使用AWS IAM配置AWS權限,則可以使用iam_job_flow_role MRJob選項爲作業指定IAM配置文件。有關更多詳細信息,請參閱iam_job_flow_role。默認情況下,需要mrjob.conf中的以下行:

iam-job-flow-role: EMRDefaultRole