2017-01-25 50 views
0

我試圖將平面CSV轉換爲嵌套的JSON格式。這是我的數據:使用Python/pandas嵌套JSON的CSV

# data.csv 
company_id,company_name,income_type,income_amt 
1,"Foobar Inc","royalties",5000000 
2,"ACME Corp","sales",3000000 
2,"ACME Corp","rent",1000000 

而且需要轉換到以下JSON結構:

{"data": [{ 
      "company_id": 1, 
      "name": "Foobar Inc", 
      "income": ["royalties": 5000000] 
     }, 
     { 
      "company_id": 2, 
      "company_name": "ACME Corp", 
      "income": [ 
       "sales": 3000000, 
       "rent": 1000000 
      ] 
     }] 
} 

但我當前的代碼(基於this和使用Python和大熊貓庫):

# script.py 
import json 
import pandas as pd 

df = pd.read_csv('data.csv') 

def get_nested_rec(key, grp): 
rec = {} 

    rec['company_id'] = key[0] 
    rec['company_name'] = key[1] 

    for field in ['income_type']: 
     income_types = list(grp[field].unique()) 
     rec['income'] = income_types 

    return rec 

records = [] 

for key, grp in df.groupby(['company_id','company_name','income_type','income_amt']): 
    rec = get_nested_rec(key, grp) 
    records.append(rec) 

records = dict(data = records) 

print(json.dumps(records, indent=4)) 

輸出此格式:

{"data": [ 
     { 
      "company_id": 1, 
      "company_name": "Foobar Inc", 
      "income": [ 
       "royalties" 
      ] 
     }, 
     { 
      "company_id": 2, 
      "company_name": "ACME Corp", 
      "income": [ 
       "sales" 
      ] 
     }, 
     { 
      "company_id": 2, 
      "company_name": "ACME Corp", 
      "income": [ 
       "rent" 
      ] 
     } 
    ]} 

找出如何將具有相同company_id的行組合到單個對象中並添加income_amt值的牆上。

回答

1

你可以這樣說:

for key, grp in df.groupby('company_id'): 
    records.append({ 
     "company_id": key, 
     "company_name": grp.company_name.iloc[0], 
     "income": { 
      row.income_type: row.income_amt for row in grp.itertuples() 
     }}) 

這就給了你:

[{'company_id': 1, 
    'company_name': 'Foobar Inc', 
    'income': {'royalties': 5000000}}, 
{'company_id': 2, 
    'company_name': 'ACME Corp', 
    'income': {'rent': 1000000, 'sales': 3000000}}]