Python爲S3上傳生成了AWS CLI進程，並且變得非常慢

我的Python應用程序爲AWS CLI S3上傳創建了一個子進程。Python爲S3上傳生成了AWS CLI進程，並且變得非常慢

command = 'aws s3 sync /tmp/tmp_dir s3://mybucket/tmp_dir' 
# spawn the process 
sp = subprocess.Popen(
    shlex.split(str(command)), 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE) 
# wait for a while 
sp.wait() 
out, err = sp.communicate() 

if sp.returncode == 0: 
    logger.info("aws return code: %s", sp.returncode) 
    logger.info("aws cli stdout `{}`".format(out)) 
    return 

# handle error

/tmp/tmp_dir是〜0.5Gb幷包含約100個文件。上傳過程需要約25分鐘，這是非常緩慢的。

如果我直接運行AWS命令（不使用Python），它只需不到1分鐘。

怎麼了？任何幫助表示讚賞。

來源

2017-02-03 Andrii Skaliuk

我注意到有關wait()使用情況的文檔中的警告（請參見下文）。然而，不要調試它，爲什麼不重寫它來使用Python SDK而不是shell來支持aws cli？可能你會得到更好的性能和更乾淨的代碼。

https://boto3.readthedocs.io/en/latest/guide/s3.html

警告此使用標準輸出=管和/或標準錯誤= PIPE和子進程時就會死鎖產生足夠的輸出到管道，使得它阻止等待OS管緩衝器接受更多數據。使用通信（）來避免這種情況。

https://docs.python.org/2/library/subprocess.html

EDIT3：

這裏是我只是測試的解決方案，它運行而不阻塞。有一些便利的方法，它們使用wait（）或communicat（），它們更容易使用，比如check_output：

#!/usr/bin/env python 
import subprocess 
from subprocess import CalledProcessError 

command = ['aws','s3','sync','/tmp/test-sync','s3://bucket-name/test-sync'] 
try: 
    result = subprocess.check_output(command) 
    print(result) 
except CalledProcessError as err: 
    # handle error, check err.returncode which is nonzero. 
    pass

來源

2017-02-03 20:42:25

Python SDK現在不提供相同的功能。我正在使用'sync'。這可能會更好，但方式更費時。你能提供一個代碼來避免管道阻塞的例子嗎？謝謝。 –

嗯，是的，我明白你的意思是同步（遞歸拷貝dir）沒有被執行。這裏有一個我發現可能有用的要點：https://gist.github.com/SavvyGuard/6115006#file-botos3upload-py-L30 –

另外我編輯了我的答案，建議另外使用'subprocess'。 –

Python爲S3上傳生成了AWS CLI進程，並且變得非常慢

回答

相關問題