2016-10-20 174 views
2

我有一個嵌套的JSON如下圖所示,要解析成在Python多個數據幀嵌套解析JSON成多個數據幀..請幫助使用熊貓蟒蛇

{ 
"tableName": "cases", 
"url": "EndpointVoid", 
"tableDataList": [{ 
    "_id": "100017252700", 
    "title": "Test", 
    "type": "TECH", 
    "created": "2016-09-06T19:00:17.071Z", 
    "createdBy": "193164275", 
    "lastModified": "2016-10-04T21:50:49.539Z", 
    "lastModifiedBy": "1074113719", 
    "notes": [{ 
     "id": "30", 
     "title": "Multiple devices", 
     "type": "INCCL", 
     "origin": "D", 
     "componentCode": "PD17A", 
     "issueCode": "IP321", 
     "affectedProduct": "134322", 
     "summary": "testing the json", 

     "caller": { 
      "email": "[email protected]", 
      "phone": "651-744-4522" 
     } 
    }, { 
     "id": "50", 
     "title": "EDU: Multiple Devices - Lightning-to-USB Cable", 
     "type": "INCCL", 
     "origin": "D", 
     "componentCode": "PD17A", 
     "issueCode": "IP321", 
     "affectedProduct": "134322", 
     "summary": "parsing json 2", 
     "caller": { 
      "email": "[email protected]", 
      "phone": "123-345-1111" 
     } 
    }], 
    "syncCount": 2316, 
    "repair": [{ 
      "id": "D208491610", 
      "created": "2016-09-06T19:02:48.000Z", 
      "createdBy": "193164275", 
      "lastModified": "2016-09-21T12:49:47.000Z" 
     }, { 
      "id": "D208491610" 
     }, { 
      "id": "D208491628", 
      "created": "2016-09-06T19:03:37.000Z", 
      "createdBy": "193164275", 
      "lastModified": "2016-09-21T12:49:47.000Z" 
     } 

    ], 
    "enterpriseStatus": "8" 
}], 
"dateTime": 1475617849, 
"primaryKeys": ["$._id"], 
"primaryKeyVals": ["100017252700"], 
"operation": "UPDATE" 

}

我要分析此並創建3個表/數據幀/ CSV如下圖所示..請幫助..

Output table in this format

+0

我想你的JSON是無效的 - 請檢查[http://jsonlint.com /](http://jsonlint.com/) – jezrael

+0

感謝jezrael for lettin g我知道..它是複製粘貼錯誤..我只是修復了JSON文件.. – Raj

回答

0

我不認爲這是最好的方式,但我想告訴你possib ility。

import pandas as pd 
from pandas.io.json import json_normalize 
import json 

with open('your_sample.json') as f:  
    dt = json.load(f) 

表1

df1 = json_normalize(dt, 'tableDataList', 'dateTime')[['_id', 'title', 'type', 'created', 'createdBy', 'lastModified', 'lastModifiedBy', 'dateTime']] 
print df1 


      _id title type     created createdBy \ 
0 100017252700 Test TECH 2016-09-06T19:00:17.071Z 193164275 

       lastModified lastModifiedBy dateTime 
0 2016-10-04T21:50:49.539Z  1074113719 1475617849 

表2

df2 = json_normalize(dt['tableDataList'], 'notes', '_id') 
df2['phone'] = df2['caller'].map(lambda x: x['phone']) 
df2['email'] = df2['caller'].map(lambda x: x['email']) 
df2 = df2[['_id', 'id', 'title', 'email', 'phone']] 
print df2 


      _id id           title \ 
0 100017252700 30        Multiple devices 
1 100017252700 50 EDU: Multiple Devices - Lightning-to-USB Cable 

        email   phone 
0 [email protected] 651-744-4522 
1  [email protected] 123-345-1111 

表3

df3 = json_normalize(dt['tableDataList'], 'repair', '_id').dropna() 
print df3 


        created createdBy   id    lastModified \ 
0 2016-09-06T19:02:48.000Z 193164275 D208491610 2016-09-21T12:49:47.000Z 
2 2016-09-06T19:03:37.000Z 193164275 D208491628 2016-09-21T12:49:47.000Z 

      _id 
0 100017252700 
2 100017252700 
+0

這段代碼工作..基本上我從Jongo出口數據從MongoDB,如果我得到多個案例記錄代碼不工作,有時幾個列不會填充在JSON中,並再次面臨JSON索引不可用的問題... – Raj