使用PyMongo將熊貓數據框插入到mongodb

使用PyMongo將最簡單的方法將pandas DataFrame插入到mongodb中是什麼？使用PyMongo將熊貓數據框插入到mongodb

嘗試

db.myCollection.insert(df.to_dict())

給了一個錯誤InvalidDocument: documents must have only string keys, key was Timestamp('2013-11-23 13:31:00', tz=None)

db.myCollection.insert(df.to_json())

給了一個錯誤TypeError: 'str' object does not support item assignment

db.myCollection.insert({id: df.to_json()})

給了一個錯誤InvalidDocument: documents must have only string keys, key was <built-in function id>

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 150 entries, 2013-11-23 13:31:26 to 2013-11-23 13:24:07 
Data columns (total 3 columns): 
amount 150 non-null values 
price  150 non-null values 
tid  150 non-null values 
dtypes: float64(2), int64(1)

來源

2013-11-23 Nyxynyx

之後你想做什麼？你需要每個記錄一個文檔還是每個數據框一個文檔？ – alko

每個mongo記錄都有字段'date'，'amount'，'price'和tid。 'tid'應該是一個唯一的字段 – Nyxynyx

我懷疑是有兩個最快和簡單方法。如果你不擔心數據的轉換，你可以做

>>> import json 
>>> df = pd.DataFrame.from_dict({'A': {1: datetime.datetime.now()}}) 
>>> df 
          A 
1 2013-11-23 21:14:34.118531 

>>> records = json.loads(df.T.to_json()).values() 
>>> db.myCollection.insert(records)

，但如果你試圖load data back，你會得到：

>>> df = read_mongo(db, 'myCollection') 
>>> df 
        A 
0 1385241274118531000 
>>> df.dtypes 
A int64 
dtype: object

所以你必須將「A」在您的DataFrame中回到datetime s，以及所有不是int,float或str字段。對於這個例子：

>>> df['A'] = pd.to_datetime(df['A']) 
>>> df 
          A 
0 2013-11-23 21:14:34.118531

來源

2013-11-23 21:17:09 alko

'db.myCollection.insert（records）'應該由'db.myCollection.insert_many（records）'替換'see warning'// anaconda/bin/ipython：1：DeprecationWarning：insert已被棄用。改用insert_one或insert_many。＃！/ bin/bash // anaconda/bin/python.app' –

這個怎麼樣：

db.myCollection.insert({id: df.to_json()})

ID將是該DF的唯一字符串

來源

2013-11-23 20:20:38 PasteBT

謝謝，我得到錯誤'InvalidDocument：文檔必須只有字符串鍵，鍵是<內置函數id>' – Nyxynyx

你必須自己生成該ID – PasteBT

這個id是否相同像mongo文件中通常的'_.id'一樣？如果是這樣，它看起來像一個隨機哈希，我怎麼生成它？ – Nyxynyx

在這裏你有最快捷的方式。使用pymongo 3的insert_many方法和to_dict方法的'記錄'參數。

db.insert_many(df.to_dict('records'))

來源

2015-06-26 16:34:33 dieguico

這是最好的主意imo，但我不認爲這個語法會對原始用例起作用。基本問題是mongo需要字符串鍵，而你的df有一個Timestamp索引。你需要使用傳遞給'to_dict（）'的參數來使得mongo中的鍵不是日期。我經常使用的一個案例是，你真的希望df中的每一行都是一個帶有「日期」字段的記錄。 –

odo可以使用

odo(df, db.myCollection)

來源

2015-12-27 17:37:44

我真的很喜歡'odo'，但是當mongo uri有非alpha用戶名passwd時，它會失敗。我不會推薦它，除了使用未經認證的mongo。 – armundle

做到這一點如果你的數據幀已經丟失的數據（即無，男），並在文檔中你不想空鍵值：

db.insert_many(df.to_dict("records"))將插入具有空值的鍵。如果你不想在你的文檔中的空鍵值就可以使用熊貓的修改版本低於.to_dict("records")代碼：

from pandas.core.common import _maybe_box_datetimelike 
my_list = [dict((k, _maybe_box_datetimelike(v)) for k, v in zip(df.columns, row) if v != None and v == v) for row in df.values] 
db.insert_many(my_list)

其中if v != None and v == v我已經添加了檢查，以確保該值不None或nan然後將其放入行的字典中。現在您的.insert_many將僅包含文檔中具有值的鍵（並且不包含數據類型null）。

來源

2016-06-15 00:00:49

我覺得這個問題很酷。就我而言，我一直花時間更多地關注大型數據框的移動。在這種情況下，熊貓往往會允許你選擇chunksize（例如pandas.DataFrame.to_sql）。所以我認爲我通過添加我在這個方向使用的函數來貢獻於此。

def write_df_to_mongoDB( my_df,\ 
          database_name = 'mydatabasename' ,\ 
          collection_name = 'mycollectionname', 
          server = 'localhost',\ 
          mongodb_port = 27017,\ 
          chunk_size = 100): 
    #""" 
    #This function take a list and create a collection in MongoDB (you should 
    #provide the database name, collection, port to connect to the remoete database, 
    #server of the remote database, local port to tunnel to the other machine) 
    # 
    #--------------------------------------------------------------------------- 
    #Parameters/Input 
    # my_list: the list to send to MongoDB 
    # database_name: database name 
    # 
    # collection_name: collection name (to create) 
    # server: the server of where the MongoDB database is hosted 
    #  Example: server = '132.434.63.86' 
    # this_machine_port: local machine port. 
    #  For example: this_machine_port = '27017' 
    # remote_port: the port where the database is operating 
    #  For example: remote_port = '27017' 
    # chunk_size: The number of items of the list that will be send at the 
    #  some time to the database. Default is 100. 
    # 
    #Output 
    # When finished will print "Done" 
    #---------------------------------------------------------------------------- 
    #FUTURE modifications. 
    #1. Write to SQL 
    #2. Write to csv 
    #---------------------------------------------------------------------------- 
    #30/11/2017: Rafael Valero-Fernandez. Documentation 
    #""" 



    #To connect 
    # import os 
    # import pandas as pd 
    # import pymongo 
    # from pymongo import MongoClient 

    client = MongoClient('localhost',int(mongodb_port)) 
    db = client[database_name] 
    collection = db[collection_name] 
    # To write 
    collection.delete_many({}) # Destroy the collection 
    #aux_df=aux_df.drop_duplicates(subset=None, keep='last') # To avoid repetitions 
    my_list = my_df.to_dict('records') 
    l = len(my_list) 
    ran = range(l) 
    steps=ran[chunk_size::chunk_size] 
    steps.extend([l]) 

    # Inser chunks of the dataframe 
    i = 0 
    for j in steps: 
     print j 
     collection.insert_many(my_list[i:j]) # fill de collection 
     i = j 

    print('Done') 
    return

來源

2018-03-06 09:48:37

使用PyMongo將熊貓數據框插入到mongodb

回答

相關問題