我對Python很新,我正在努力將嵌套的json
文件轉換爲cvs
。爲此,我開始加載json
,然後以使用json_normalize打印出好輸出的方式對其進行轉換,然後使用pandas軟件包I將標準化部分輸出到cvs
。嵌套的json到csv - 通用的方法
我的例子JSON:
[{
"_id": {
"id": "123"
},
"device": {
"browser": "Safari",
"category": "d",
"os": "Mac"
},
"exID": {
"$oid": "123"
},
"extreme": false,
"geo": {
"city": "London",
"country": "United Kingdom",
"countryCode": "UK",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-02-12"
},
"attributes": [{
"name": "gender",
"numeric": 0,
"value": 0
}, {
"name": "email",
"value": false
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}, {
"_id": {
"id": "456"
},
"device": {
"browser": "Chrome 47",
"category": "d",
"os": "Windows"
},
"exID": {
"$oid": "345"
},
"extreme": false,
"geo": {
"city": "Berlin",
"country": "Germany",
"countryCode": "DE",
"ip": "00.000.000.0"
},
"viewed": {
"$date": "2011-05-12"
},
"attributes": [{
"name": "gender",
"numeric": 1,
"value": 1
}, {
"name": "email",
"value": true
}],
"change": [{
"id": {
"$id": "1231"
},
"seen": [{
"$date": "2011-02-12"
}]
}]
}]
用下面的代碼(在這裏我排除嵌套的部分):
import json
from pandas.io.json import json_normalize
def loading_file():
#File path
file_path = #file path here
#Loading json file
json_data = open(file_path)
data = json.load(json_data)
return data
#Storing avaliable keys
def data_keys(data):
keys = {}
for i in data:
for k in i.keys():
keys[k] = 1
keys = keys.keys()
#Excluding nested arrays from keys - hard coded -> IMPROVE
new_keys = [x for x in keys if
x != 'attributes' and
x != 'change']
return new_keys
#Excluding nested arrays from json dictionary
def new_data(data, keys):
new_data = []
for i in range(0, len(data)):
x = {k:v for (k,v) in data[i].items() if k in keys }
new_data.append(x)
return new_data
def csv_out(data):
data.to_csv('out.csv',encoding='utf-8')
def main():
data_file = loading_file()
keys = data_keys(data_file)
table = new_data(data_file, keys)
csv_out(json_normalize(table))
main()
我的電流輸出看起來是這樣的:
| _id.id | device.browser | device.category | device.os | ... | viewed.$date |
|--------|----------------|-----------------|-----------|------|--------------|
| 123 | Safari | d | Mac | ... | 2011-02-12 |
| 456 | Chrome 47 | d | Windows | ... | 2011-05-12 |
| | | | | | |
我問題是我想將嵌套數組包含到CVS中,所以我必須將它們壓扁。我不知道如何使它通用,所以在創建表格時我不使用字典keys
(numeric, id, name
)和values
。由於attributes
和change
中的密鑰數量,我必須使其一般化。因此,我想有這樣的輸出:
| _id.id | device.browser | ... | attributes_gender_numeric | attributes_gender_value | attributes_email_value | change_id | change_seen |
|--------|----------------|-----|---------------------------|-------------------------|------------------------|-----------|-------------|
| 123 | Safari | ... | 0 | 0 | false | 1231 | 2011-02-12 |
| 456 | Chrome 47 | ... | 1 | 1 | true | 1231 | 2011-02-12 |
| | | | | | | | |
謝謝你提前!任何提示如何提高我的代碼,並使其更有效率是非常受歡迎的。
嗨,謝謝你的迴應,但它不是我正在尋找的,因爲它輸出嵌套數組,我想每個值都在一個單獨的單元格。 –