2016-12-14 266 views
3

我有一個複雜的JSON文件,看起來像這樣:Python的大熊貓 - JSON來數據幀

{ 
    "User A" : { 
    "Obj1" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3", 
    } 
    "Obj2" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3" 
    } 
    } 
    "User B" : { 
    "Obj1" : { 
     "key1": "val1", 
     "key2": "val2", 
     "key3": "val3", 
     "key4": "val4" 
    } 
    } 
} 

而且我希望把它變成一個數據幀,看起來像這樣:

   key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 

這是大熊貓可能嗎?如果是這樣,我該如何設法做到這一點?

  • 如果更簡單,我不介意刪除用戶和Obj的前兩列,只保留在鍵的列。

回答

2

你可以先讀文件到dict

with open('file.json') as data_file:  
    dd = json.load(data_file) 

print(dd) 
{'User B': {'Obj1': {'key2': 'val2', 'key4': 'val4', 'key1': 'val1', 'key3': 'val3'}}, 
'User A': {'Obj1': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}, 
'Obj2': {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'}}} 

然後用dict comprehensionconcat

df = pd.concat({key:pd.DataFrame(dd[key]).T for key in dd.keys()}) 
print (df) 
      key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 

另一種解決方案與read_json,但首先需要通過unstack重塑和刪除NaN行通過dropna。最後需要DataFrame.from_records

df = pd.read_json('file.json').unstack().dropna() 
print (df) 
User A Obj1  {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'} 
     Obj2  {'key2': 'val2', 'key1': 'val1', 'key3': 'val3'} 
User B Obj1 {'key2': 'val2', 'key4': 'val4', 'key1': 'val1... 
dtype: object 

df1 = pd.DataFrame.from_records(df.values.tolist()) 
print (df1) 
    key1 key2 key3 key4 
0 val1 val2 val3 NaN 
1 val1 val2 val3 NaN 
2 val1 val2 val3 val4 

df1 = pd.DataFrame.from_records(df.values.tolist(), index = df.index) 
print (df1) 
      key1 key2 key3 key4 
User A Obj1 val1 val2 val3 NaN 
     Obj2 val1 val2 val3 NaN 
User B Obj1 val1 val2 val3 val4 
+0

你是如此的幫助謝謝!無法想象我的工作了一小時的東西,可以用兩行代碼,這麼優雅...有沒有一種簡單的方法來保存這個DF作爲一個Excel文件? – TheDaJon

+0

謝謝你的接受!當然,使用['to_excel'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html) - ''df1.to_excel('file.xlsx')'或' df1.to_excel('file.xlsx',index = False)'如果需要刪除索引。 – jezrael