2016-09-24 30 views

回答

3

運行以下在另外的空單元格的代碼:

%%storage read --object <path-to-gcs-bucket>/my_pickle_file.pkl --variable test_pickle_var 

然後運行以下代碼:

from io import BytesIO  
pickle.load(BytesIO(test_pickle_var)) 

我用下面的代碼上傳將大熊貓DataFrame複製到Google Cloud Storage作爲醃製文件並將其讀回:

from datalab.context import Context 
import datalab.storage as storage 
import pandas as pd 
from io import BytesIO 
import pickle 

df = pd.DataFrame(data=[{1,2,3},{4,5,6}],columns=['a','b','c']) 

# Create a local pickle file 
df.to_pickle('my_pickle_file.pkl') 

# Create a bucket in GCS 
sample_bucket_name = Context.default().project_id + '-datalab-example' 
sample_bucket_path = 'gs://' + sample_bucket_name 
sample_bucket = storage.Bucket(sample_bucket_name) 
if not sample_bucket.exists(): 
    sample_bucket.create() 

# Write pickle to GCS 
sample_item = sample_bucket.item('my_pickle_file.pkl') 
with open('my_pickle_file.pkl', 'rb') as f: 
    sample_item.write_to(bytearray(f.read()), 'application/octet-stream') 

# Read Method 1 - Read pickle from GCS using %storage read (note single % for line magic) 
path_to_pickle_in_gcs = sample_bucket_path + '/my_pickle_file.pkl' 
%storage read --object $path_to_pickle_in_gcs --variable remote_pickle_1 
df_method1 = pickle.load(BytesIO(remote_pickle_1)) 
print(df_method1) 

# Read Alternate Method 2 - Read pickle from GCS using storage.Bucket.item().read_from() 
remote_pickle_2 = sample_bucket.item('my_pickle_file.pkl').read_from() 
df_method2 = pickle.load(BytesIO(remote_pickle_2)) 
print(df_method2) 

注意:有一個known issue%storage命令不起作用,如果它是單元格中的第一行。在第一行放置註釋或python代碼。

+1

謝謝。我嘗試使用%%存儲與酸菜加載。不知何故,它不適合我。它對你有用嗎?替代方案也很好 - 一個有效的解決方法。 –

+0

我不確定問題在於醃菜本身。當我試圖通過python手段讀取數據時 - 意味着一切都正常。雖然我使用BytesIO。然而,當我嘗試使用存儲子句時 - 什麼都不會發生 –

+1

您可以嘗試提供的示例代碼(StringIO)來確認它在您的端點上有效嗎?請分享一個代碼片段,它不會按預期的那樣執行,以幫助進行故障排除。 –