2015-12-04 103 views
1

我有我想把它轉換成在大熊貓數據幀如何將mongoDB文件轉換爲python?

db.dataset2.insert(
{ 
"user_id" : "user_3", 
"order_id" : "order_3", 
"order_lat " : -73.9557413,      ## Order location 
"order_long" : 40.7720266, 
"order_time" : datetime.utcnow(), 

"dish" : [ 
     { 
     "dish_id"   : "005" , 
     "dish_name"  : "Sandwitch", 
     "dish_substitute" : "Yes", 
     "substitute_name" : "Null", 
     "dish_type"  : "Veg",    ## Binary response (Veg or Non-Veg) 
     "dish_price"  : 50, 
     "dish_quantity" : 1, 
     "ratings"   : 3, 
     "reviews"   : "blah blah blah", 
     "home_chef_name" : "ghyty", 
     "expert_chef_name" : "abc" , 
     "coupon_applied" : "Yes",    ## Binary response (Yes or No) 
     "coupon_type"  : "Rs 20 off" 
     }, 
     { 
     "dish_id"   : "006" , 
     "dish_name"  : "Chicken Hundi", 
     "dish_substitute" : "No", 
     "substitute_name" : "Null", 
     "dish_type"  : "Non-Veg", 
     "dish_price"  : 125, 
     "dish_quantity" : 1, 
     "ratings"   : 3, 
     "reviews"   : "blah blah blah", 
     "home_chef_name" : "rtyu", 
     "expert_chef_name" : "vbghy" , 
     "coupon_applied" : "No", 
     "coupon_type"  : "Null" 

     } 
    ], 

} )一個MongoDB的文檔

當我做以下

df = pd.DataFrame(list(db.dataset2.find())) 

它給了我下面的輸出

  _id \ 
    0 566148e3691db01e0cac9d82 
    1 56615926691db01e0cac9d83 
    2 56615c64691db01e0cac9d84 

          dish      order_id order_lat 
0 [{u'dish_substitute': u'Yes', u'home_chef_name... order_1 -73.955741 
1 [{u'dish_substitute': u'Yes', u'home_chef_name... order_2 -73.955741 
2 [{u'dish_substitute': u'Yes', u'home_chef_name... order_3 -73.955741 

    order_long  order_time   user_id 
0 40.772027 2015-12-04 08:03:47.658 user_1 
1 40.772027 2015-12-04 09:13:10.642 user_2 
2 40.772027 2015-12-04 09:27:00.497 user_3 

菜是js在陣列上。當我將其轉換爲數據框時,它會添加菜欄,並將所有內容放在該欄下。我想將其轉換爲數據框以便進行數據探索。怎麼做?我希望它成爲以下格式。

 _id     order_id order_lat order_long 
0 566148e3691db01e0cac9d82 order_1 -73.955741 40.772027 
1 566148e3691db01e0cac9d82 order_1 -73.955741 40.772027 

    order_time    user_id coupon_applied coupon_type dish_id 
0 2015-12-04 08:03:47.658 user_1   Yes Rs 20 off  001 
1 2015-12-04 08:03:47.658 user_1    No  Null  001 

    dish_name  dish_price dish_quantity dish_substitute dish_type 
0 Chicken Biryani  120    1    Yes  Non-Veg 
1 Paneer Biryani  100    1    Yes  Veg 

expert_chef_name home_chef_name ratings reviews   substitute_name 
0  abc   xyx  4 blah blah blah   Rice 
1  abc   abc  3 blah blah blah   Paratha 

請幫助..在此先感謝:)

回答

0

您只需在df.dishjoin it創建記錄的臨時DataFrame回到原來的df

像這樣:

df = pd.DataFrame(list(db.dataset2.find())) 
tf = pd.DataFrame.from_records(df.dish) 
df = df.join(tf) 
+0

它給了我下面的錯誤...'參數 '行' 的類型不正確(預期列表中,有Seri​​es'當我這樣做'TF = pd.DataFrame.from_records(DF。菜)' – Neil

+0

@ user2927983嘗試'tf = pd.DataFrame.from_records(list(df.dish))' – ComputerFellow

+0

不按預期工作..它將兩個菜屬性放在0和1標題下 – Neil