2016-12-16 114 views
1

我知道這個問題已被多次詢問。我嘗試了幾種解決方案,但是我無法解決我的問題。在Python中將嵌套的JSON轉換爲CSV文件

我有一個大的嵌套JSON文件(1.4GB),我想使它變平,然後將其轉換爲CSV文件。

的JSON結構是這樣的:

{ 
    "company_number": "12345678", 
    "data": { 
    "address": { 
     "address_line_1": "Address 1", 
     "locality": "Henley-On-Thames", 
     "postal_code": "RG9 1DP", 
     "premises": "161", 
     "region": "Oxfordshire" 
    }, 
    "country_of_residence": "England", 
    "date_of_birth": { 
     "month": 2, 
     "year": 1977 
    }, 
    "etag": "26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00", 
    "kind": "individual-person-with-significant-control", 
    "links": { 
     "self": "/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl" 
    }, 
    "name": "John M Smith", 
    "name_elements": { 
     "forename": "John", 
     "middle_name": "M", 
     "surname": "Smith", 
     "title": "Mrs" 
    }, 
    "nationality": "Vietnamese", 
    "natures_of_control": [ 
     "ownership-of-shares-50-to-75-percent" 
    ], 
    "notified_on": "2016-04-06" 
    } 
} 

我知道,這是很容易與pandas模塊來完成,但我不熟悉它。

EDITED

所需的輸出應該是這樣的:

company_number, address_line_1, locality, country_of_residence, kind, 

12345678, Address 1, Henley-On-Thamed, England, individual-person-with-significant-control 

注意,這僅僅是一個短版。輸出應該包含所有的字段。

+0

你能顯示所需的輸出嗎? – zipa

+0

我編輯了我的帖子 – Porjaz

+0

首先你必須自己解決這個錯誤..但我沒有得到錯誤,並且json加載正常 – Matthias

回答

1

你可以通過解析JSON結構,只是返回所有的葉子節點列表如下做到這一點:

import json 
import csv 

def get_leaves(item, key=None): 
    if isinstance(item, dict): 
     leaves = [] 
     for i in item.keys(): 
      leaves.extend(get_leaves(item[i], i)) 
     return leaves 
    elif isinstance(item, list): 
     leaves = [] 
     for i in item: 
      leaves.extend(get_leaves(i, key)) 
     return leaves 
    else: 
     return [(key, item)] 


with open('json.txt') as f_input, open('output.csv', 'wb') as f_output: 
    csv_output = csv.writer(f_output) 
    write_header = True 

    for entry in json.load(f_input): 
     leaf_entries = sorted(get_leaves(entry)) 

     if write_header: 
      csv_output.writerow([k for k, v in leaf_entries]) 
      write_header = False 

     csv_output.writerow([v for k, v in leaf_entries]) 

如果你的JSON數據是你給的格式條目列表,然後你應該得到的輸出如下:

address_line_1,company_number,country_of_residence,etag,forename,kind,locality,middle_name,month,name,nationality,natures_of_control,notified_on,postal_code,premises,region,self,surname,title,year 
Address 1,12345678,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977 
Address 1,12345679,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977 

注:如果您使用Python 3.x中,更改以下行:

with open('json.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output: 
+0

我認爲這可能會導致問題,如果嵌套的鍵值在整個json文件不一致。如果其中一個結構缺少一個字段,則該行中的數據將被偏移。 –

+0

此代碼無法用於我的json數據。我只能解析這個鍵:「K6v8Ht6nXCjaO_ApNGr」你能幫我解釋一下嗎?請。我的Python版本是3.6.4 – tpbafk

+0

@tpbafk,對於Python 3.x,你需要對'open()'命令做一個小改動(我已經更新了腳本),但是沒有看到你的JSON,我不會能夠告訴你它不解析所有內容的原因。也許你應該開始一個新的問題? –