2016-05-26 95 views
1

我以CSV格式獲取以下數據。如何在Python中將CSV轉換爲嵌套的JSON(upto level 3)

id,category,sub_category,sub_category_type,count 
0,fruits,citrus,lemon,30 
1,fruits,citrus,lemon,40 
2,fruits,citrus,lemon,50 
3,fruits,citrus,grapefruit,20 
4,fruits,citrus,orange,40 
5,fruits,citrus,orange,10 
6,fruits,berries,blueberry,20 
7,fruits,berries,strawberry,50 
8,fruits,berries,strawberry,90 
9,fruits,berries,cranberry,70 
10,fruits,berries,raspberry,16 
11,fruits,berries,raspberry,80 
12,fruits,dried fruit,raisins,10 
13,fruits,dried fruit,dates,15 
14,fruits,dried fruit,dates,10 
15,vegetables,legumes,beans,12 
16,vegetables,legumes,beans,15 
17,vegetables,legumes,chickpea,12 
18,vegetables,green leaf,spinach,18 
19,vegetables,green leaf,cress,19 

我想上面的CSV格式轉換爲嵌套JSON作爲pandas.DataFrame.to_json()donesn't幫我在轉換爲嵌套的JSON格式。

有沒有解決方案?

PS:我在回答上述問題Q &風格分享知識。我很樂意知道是否有其他解決方案比這更好。

+0

你爲什麼在要求它立刻回答自己的問題。你的回答應該成爲你原來問題的一部分。 – Alexander

+1

然後你能解釋我該如何回答這個問題嗎?當我點擊複選框**回答你自己的問題 - 分享你的知識問答風格**,它打開一個文本框發佈答案。如果我的回答應該是我問題的一部分,那麼爲什麼必須有另一個文本框來發布答案? –

+0

或者,您的回覆可能是您嘗試過的(應該是問題的一部分),但您爲此尋求更好的解決方案。 – Alexander

回答

0

以下代碼是從this github鏈接獲得靈感。此代碼將有助於我們在轉換CSV高達3級嵌套的JSON

import pandas as pd 
import json 


df = pd.read_csv('data.csv') 

# choose columns to keep, in the desired nested json hierarchical order 
df = df[["category", "sub_category","sub_category_type", "count"]] 

# order in the groupby here matters, it determines the json nesting 
# the groupby call makes a pandas series by grouping "category", "sub_category" and"sub_category_type", 
#while summing the numerical column 'count' 
df1 = df.groupby(["category", "sub_category","sub_category_type"])['count'].sum() 
df1 = df1.reset_index() 

print df1 

d = dict() 
d = {"name":"stock", "children": []} 

for line in df1.values: 
    category = line[0] 
    sub_category = line[1] 
    sub_category_type = line[2] 
    count = line[3] 

    # make a list of keys 
    category_list = [] 
    for item in d['children']: 
     category_list.append(item['name']) 

    # if 'category' is NOT category_list, append it 
    if not category in category_list: 
     d['children'].append({"name":category, "children":[{"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]}]}) 

    # if 'category' IS in category_list, add a new child to it 
    else: 
     sub_list = []   
     for item in d['children'][category_list.index(category)]['children']: 
      sub_list.append(item['name']) 
     print sub_list 

     if not sub_category in sub_list: 
      d['children'][category_list.index(category)]['children'].append({"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]}) 
     else: 
      d['children'][category_list.index(category)]['children'][sub_list.index(sub_category)]['children'].append({"name": sub_category_type, "count" : count}) 


print json.dumps(d) 

在執行時,

{ 
"name": "stock", 
"children": [ 
    {"name": "fruits", 
    "children": [ 
     {"name": "berries", 
     "children": [ 
      {"count": 20, "name": "blueberry"}, 
      {"count": 70, "name": "cranberry"}, 
      {"count": 96, "name": "raspberry"}, 
      {"count": 140, "name": "strawberry"}] 
     }, 
     {"name": "citrus", 
     "children": [ 
      {"count": 20, "name": "grapefruit"}, 
      {"count": 120, "name": "lemon"}, 
      {"count": 50, "name": "orange"}] 
     }, 
     {"name": "dried fruit", 
     "children": [ 
      {"count": 25, "name": "dates"}, 
      {"count": 10, "name": "raisins"}] 
     }] 
    }, 
    {"name": "vegtables", 
    "children": [ 
     {"name": "green leaf", 
     "children": [ 
      {"count": 19, "name": "cress"}, 
      {"count": 18, "name": "spinach"}] 
     }, 
     { 
     "name": "legumes", 
     "children": [ 
      {"count": 27, "name": "beans"}, 
      {"count": 12, "name": "chickpea"}] 
     }] 
    }] 
}