我正在使用具有ShellCommandActivity的AWS Data Pipeline,它將腳本URI設置爲位於s3存儲桶中的bash文件。 bash文件將位於同一s3存儲桶中的python腳本複製到EmrCluster,然後該腳本嘗試執行該python腳本。如何製作AWS數據管道ShellCommandActivity腳本執行python文件
這是我的管道出口:
{
"objects": [
{
"name": "DefaultResource1",
"id": "ResourceId_27dLM",
"amiVersion": "3.9.0",
"type": "EmrCluster",
"region": "us-east-1"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://project/bin/scripts/logs/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"stage": "true",
"scriptUri": "s3://project/bin/scripts/RunPython.sh",
"name": "DefaultShellCommandActivity1",
"id": "ShellCommandActivityId_hA57k",
"runsOn": {
"ref": "ResourceId_27dLM"
},
"type": "ShellCommandActivity"
}
],
"parameters": []
}
這是RunPython.sh:
#!/usr/bin/env bash
aws s3 cp s3://project/bin/scripts/Test.py ./
python ./Test.py
這是Test.py
__author__ = 'MrRobot'
import re
import os
import sys
import boto3
print "We've entered the python file"
從標準輸出羅克每升得到:
download: s3://project/bin/scripts/Test.py to ./
從Stdeer登錄我得到:
python: can't open file 'Test.py': [Errno 2] No such file or directory
我也試圖與Python Test.py更換蟒蛇./Test.py,但我得到了相同的結果。
如何讓我的AWS數據管道執行我的Test.py腳本。
編輯
當我設置scriptUri到S3://project/bin/scripts/Test.py我收到以下錯誤 :
/mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 1: author: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 2: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 3: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 4: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 5: import: command not found /mnt/taskRunner/output/tmp/df-0947490M9EHH2Y32694-59ed8ca814264f5d9e65b2d52ce78a53/ShellCommandActivityIdJiZP720170209T175934Attempt1_command.sh: line 7: print: command not found
EDIT 2
添加以下行到Test.py
#!/usr/bin/env python
使用@franklinsijo的建議
error: line 6, in import boto3 ImportError: No module named boto3
我創建的EmrCluster一個引導作用與以下值:
然後我收到以下錯誤
s3://project/bin/scripts/BootstrapActions.sh
這是BootstrapActions.sh
#!/usr/bin/env bash
sudo pip install boto3
這工作!!!!!!!
爲什麼不直接在'ScriptUri'中引用python腳本? – franklinsijo
感謝您的建議,我從參考中得到以下錯誤。 ScriptUri中的python腳本:s3://project/bin/scripts/Test.py:沒有這樣的文件或目錄。我的s3鏈接是:https://s3.amazonaws.com/project/bin/scripts/Test.py – user908759
更改了名稱並得到了相同的錯誤。 – user908759