2017-04-11 179 views
2

我已閱讀可用的答案herehere,這些都沒有幫助。使用boto和熊貓閱讀aws s3的csv文件

我試圖從S3存儲區讀取csv對象,並且已經能夠使用以下代碼成功讀取數據。

srcFileName="gossips.csv" 
def on_session_started(): 
    print("Starting new session.") 
    conn = S3Connection() 
    my_bucket = conn.get_bucket("randomdatagossip", validate=False) 
    print("Bucket Identified") 
    print(my_bucket) 
    key = Key(my_bucket,srcFileName) 
    key.open() 
    print(key.read()) 
    conn.close() 

on_session_started() 

但是,如果我嘗試讀取使用熊貓作爲數據框的同一對象,則會出現錯誤。最常見的是S3ResponseError: 403 Forbidden

def on_session_started2(): 
    print("Starting Second new session.") 
    conn = S3Connection() 
    my_bucket = conn.get_bucket("randomdatagossip", validate=False) 
    #  url = "https://s3.amazonaws.com/randomdatagossip/gossips.csv" 
    #  urllib2.urlopen(url) 

    for line in smart_open.smart_open('s3://my_bucket/gossips.csv'): 
    print line 
    #  data = pd.read_csv(url) 
    #  print(data) 

on_session_started2() 

我在做什麼錯?我在python 2.7上,不能使用Python 3.

+1

不要在不知道自己在做什麼的情況下使用那些過時的示例。去看看boto3 – mootmoot

回答

3

這是我在S3上從csv成功讀取df所做的。

import pandas as pd 
import boto3 

bucket = "yourbucket" 
file_name = "your_file.csv" 

s3 = boto3.client('s3') 
# 's3' is a key word. create connection to S3 using default config and all buckets within S3 

obj = s3.get_object(Bucket= bucket, Key= file_name) 
# get object and file (key) from bucket 

initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word 
+3

這不適用於最新版本的熊貓。查看我對類似問題的回答https://stackoverflow.com/a/46323684/1451649適用於熊貓0.20.3 – jpobst

0

這對我有效。

import pandas as pd 
import boto3 
import io 

s3_file_key = 'data/test.csv' 
bucket = 'data-bucket' 

s3 = boto3.client('s3') 
obj = s3.get_object(Bucket=bucket, Key=s3_file_key) 

initial_df = pd.read_csv(io.BytesIO(obj['Body'].read()))