2017-02-18 22 views
3

我有取得AWS S3對象的代碼。如何用Python的csv.DictReader讀取StreamingBody?如何使用csv.DictReader讀取存儲在S3中的csv?

import boto3, csv 

session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>) 
s3_resource = session.resource('s3') 
s3_object = s3_resource.Object(<bucket>, <key>) 
streaming_body = s3_object.get()['Body'] 

#csv.DictReader(???) 
+0

'csv.DictReader(streaming_body)'? – Leon

+0

'csv.DictReader(streaming_body)'返回錯誤「TypeError:參數1必須是迭代器」。 在傳遞它之前運行read()和decode()(我不希望這樣做,因爲它會將整個文件加載到內存中),它會分別從文件中返回每個字符。 – Jon

回答

4

的代碼將是這樣的:

import boto3 
import csv 

# get a handle on s3 
s3 = boto3.resource(u's3') 

# get a handle on the bucket that holds your file 
bucket = s3.Bucket(u'bucket-name') 

# get a handle on the object you want (i.e. your file) 
obj = bucket.Object(key=u'test.csv') 

# get the object 
response = obj.get() 

# read the contents of the file and split it into a list of lines 

lines = response[u'Body'].read().split() 

# now iterate over those lines 
for row in csv.DictReader(lines): 

    # here you get a sequence of dicts 
    # do whatever you want with each line here 
    print(row) 

您可以壓縮這在實際的代碼了一點,但我試圖保持它一步一步的,以顯示與boto3對象層次。

編輯根據您如何避免整個文件讀入內存評論:我還沒有遇到這個需求着說權威,但我會嘗試包裹流,所以我可以得到一個文本文件類迭代器。例如,你可以使用codecs庫類似,以取代上述的CSV分析部:

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])): 
    print(row) 
+0

@Jon,這是否回答你的問題? – gary

+0

是的。任何方式來做到這一點,以便我不必將整個文件讀入()到內存中? – Jon

+0

'codecs.getreader()'解決方案爲我解決這個問題 –

相關問題