2016-06-08 19 views
1

我對Python很新,我正在努力將嵌套的json文件轉換爲cvs。爲此,我開始加載json,然後以使用json_normalize打印出好輸出的方式對其進行轉換,然後使用pandas軟件包I將標準化部分輸出到cvs嵌套的json到csv - 通用的方法

我的例子JSON:

[{ 
"_id": { 
    "id": "123" 
}, 
"device": { 
    "browser": "Safari", 
    "category": "d", 
    "os": "Mac" 
}, 
"exID": { 
    "$oid": "123" 
}, 
"extreme": false, 
"geo": { 
    "city": "London", 
    "country": "United Kingdom", 
    "countryCode": "UK", 
    "ip": "00.000.000.0" 
}, 
"viewed": { 
    "$date": "2011-02-12" 
}, 
"attributes": [{ 
    "name": "gender", 
    "numeric": 0, 
    "value": 0 
}, { 
    "name": "email", 
    "value": false 
}], 
"change": [{ 
    "id": { 
    "$id": "1231" 
    }, 
    "seen": [{ 
    "$date": "2011-02-12" 
    }] 
}] 
}, { 
"_id": { 
    "id": "456" 
}, 
"device": { 
    "browser": "Chrome 47", 
    "category": "d", 
    "os": "Windows" 
}, 
"exID": { 
    "$oid": "345" 
}, 
"extreme": false, 
"geo": { 
    "city": "Berlin", 
    "country": "Germany", 
    "countryCode": "DE", 
    "ip": "00.000.000.0" 
}, 
"viewed": { 
    "$date": "2011-05-12" 
}, 
"attributes": [{ 
    "name": "gender", 
    "numeric": 1, 
    "value": 1 
}, { 
    "name": "email", 
    "value": true 
}], 
"change": [{ 
    "id": { 
    "$id": "1231" 
    }, 
    "seen": [{ 
    "$date": "2011-02-12" 
    }] 
}] 
}] 

用下面的代碼(在這裏我排除嵌套的部分):

import json 
from pandas.io.json import json_normalize 


def loading_file(): 
    #File path 
    file_path = #file path here 

    #Loading json file 
    json_data = open(file_path) 
    data = json.load(json_data) 
    return data 

#Storing avaliable keys 
def data_keys(data): 
    keys = {} 
    for i in data: 
     for k in i.keys(): 
      keys[k] = 1 

    keys = keys.keys() 

#Excluding nested arrays from keys - hard coded -> IMPROVE 
    new_keys = [x for x in keys if 
    x != 'attributes' and 
    x != 'change'] 

    return new_keys 

#Excluding nested arrays from json dictionary 
def new_data(data, keys): 
    new_data = [] 
    for i in range(0, len(data)): 
     x = {k:v for (k,v) in data[i].items() if k in keys } 
     new_data.append(x) 
    return new_data 

def csv_out(data): 
    data.to_csv('out.csv',encoding='utf-8') 

def main(): 
    data_file = loading_file() 
    keys = data_keys(data_file) 
    table = new_data(data_file, keys) 
    csv_out(json_normalize(table)) 

main() 

我的電流輸出看起來是這樣的:

| _id.id | device.browser | device.category | device.os | ... | viewed.$date | 
|--------|----------------|-----------------|-----------|------|--------------| 
| 123 | Safari   | d    | Mac  | ... | 2011-02-12 | 
| 456 | Chrome 47  | d    | Windows | ... | 2011-05-12 | 
|  |    |     |   |  |    | 

我問題是我想將嵌套數組包含到CVS中,所以我必須將它們壓扁。我不知道如何使它通用,所以在創建表格時我不使用字典keysnumeric, id, name)和values。由於attributeschange中的密鑰數量,我必須使其一般化。因此,我想有這樣的輸出:

| _id.id | device.browser | ... | attributes_gender_numeric | attributes_gender_value | attributes_email_value | change_id | change_seen | 
|--------|----------------|-----|---------------------------|-------------------------|------------------------|-----------|-------------| 
| 123 | Safari   | ... | 0       | 0      | false     | 1231  | 2011-02-12 | 
| 456 | Chrome 47  | ... | 1       | 1      | true     | 1231  | 2011-02-12 | 
|  |    |  |       |       |      |   |    | 

謝謝你提前!任何提示如何提高我的代碼,並使其更有效率是非常受歡迎的。

回答

2

多虧了阿米爾Ziai偉大的博客文章,你可以找到here我設法輸出我的數據是一張平坦的表格。具有以下功能:

#Function that recursively extracts values out of the object into a flattened dictionary 
def flatten_json(data): 
    flat = [] #list of flat dictionaries 
    def flatten(y): 
     out = {} 

     def flatten2(x, name=''): 
      if type(x) is dict: 
       for a in x: 
        if a == "name": 
          flatten2(x["value"], name + x[a] + '_') 
        else: 
         flatten2(x[a], name + a + '_') 
      elif type(x) is list: 
       for a in x: 
        flatten2(a, name + '_') 
      else: 
       out[name[:-1]] = x 

     flatten2(y) 
     return out 

#Loop needed to flatten multiple objects 
    for i in range(len(data)): 
     flat.append(flatten(data[i]).copy()) 

    return json_normalize(flat) 

我知道的事實,這是不完全普遍意義,由於if語句名值。但是,如果刪除用於創建名稱值詞典的豁免,則可以將代碼與其他嵌入式數組一起使用。

-1

使用熊貓(運行「PIP安裝熊貓」在控制檯),兩行代碼:

import pandas 

json = pandas.read_json('2.json') 
json.to_csv('1.csv') 
+0

嗨,謝謝你的迴應,但它不是我正在尋找的,因爲它輸出嵌套數組,我想每個值都在一個單獨的單元格。 –