我打算使用Python MongoEngine框架在MongoDB中存儲Pandas DataFrame;通過df.to_list()
將熊貓數據框強制爲Python字典並將它們存儲爲嵌套的Document屬性。我試圖儘量減少代碼的數量,以便從Pandas DataFrame到BSON進行往返,然後使用名爲DataFrameField
的自定義字段類型defined in this gist將大熊貓數據幀強制轉換爲python字典並返回__set__
和__get__
方法。正確調用MongoEngine中的__set__文檔構造函數
使用點符號設置時DataFrameField,在這個偉大的工程:
import pandas as pd
import numpy as np
from mongoengine import *
a_pandas_data_frame = pd.DataFrame({
'goods': ['a', 'a', 'b', 'b', 'b'],
'stock': [5, 10, 30, 40, 10],
'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})
class my_data(Document):
data_frame = DataFrameField() # defined in the referenced gist
foo = my_data()
foo.data_frame = a_pandas_data_frame
,但如果我通過a_pandas_data_frame
它的構造,我得到:
>>> bar = my_data(data_frame = a_pandas_data_frame)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 116, in __init__
setattr(self, key, value)
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 186, in __setattr__
super(BaseDocument, self).__setattr__(name, value)
File "<stdin>", line 18, in __set__
ValueError: value is not a pandas.DataFrame instance
如果我添加打印如print value
到__set__
方法的說明,並調用構造函數,它打印:
['category', 'date', 'goods', 'stock']
它是數據幀的列名稱列表(即, list(a_pandas_data_frame.columns)
)。有沒有辦法阻止MongoEngine文檔構造函數傳遞除傳遞給__set__
方法的對象之外的東西?
謝謝!
PS,我也問在[MongoEngine回購]這個問題(https://github.com/MongoEngine/mongoengine/issues/1597),但大約有300開放性的問題,所以我不知道我期望在這個論壇任何時間很快的響應...