如何使用csv.DictReader讀取存儲在S3中的csv？

我有取得AWS S3對象的代碼。如何用Python的csv.DictReader讀取StreamingBody？如何使用csv.DictReader讀取存儲在S3中的csv？

import boto3, csv 

session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>) 
s3_resource = session.resource('s3') 
s3_object = s3_resource.Object(<bucket>, <key>) 
streaming_body = s3_object.get()['Body'] 

#csv.DictReader(???)

來源

2017-02-18 Jon

'csv.DictReader（streaming_body）'？ – Leon

'csv.DictReader（streaming_body）'返回錯誤「TypeError：參數1必須是迭代器」。在傳遞它之前運行read（）和decode（）（我不希望這樣做，因爲它會將整個文件加載到內存中），它會分別從文件中返回每個字符。 – Jon

的代碼將是這樣的：

import boto3 
import csv 

# get a handle on s3 
s3 = boto3.resource(u's3') 

# get a handle on the bucket that holds your file 
bucket = s3.Bucket(u'bucket-name') 

# get a handle on the object you want (i.e. your file) 
obj = bucket.Object(key=u'test.csv') 

# get the object 
response = obj.get() 

# read the contents of the file and split it into a list of lines 

lines = response[u'Body'].read().split() 

# now iterate over those lines 
for row in csv.DictReader(lines): 

    # here you get a sequence of dicts 
    # do whatever you want with each line here 
    print(row)

您可以壓縮這在實際的代碼了一點，但我試圖保持它一步一步的，以顯示與boto3對象層次。

編輯根據您如何避免整個文件讀入內存評論：我還沒有遇到這個需求着說權威，但我會嘗試包裹流，所以我可以得到一個文本文件類迭代器。例如，你可以使用codecs庫類似，以取代上述的CSV分析部：

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])): 
    print(row)

來源

2017-02-19 16:26:43 gary

@Jon，這是否回答你的問題？ – gary

是的。任何方式來做到這一點，以便我不必將整個文件讀入（）到內存中？ – Jon

'codecs.getreader（）'解決方案爲我解決這個問題 –

如何使用csv.DictReader讀取存儲在S3中的csv？

回答

相關問題