添加列標題使用python csv文件，讀取JSON

至所以我有一個程序，讀取JSON，變平，並轉儲CSV：添加列標題使用python csv文件，讀取JSON

import json 
import unicodecsv as csv 
import sys 
import glob 
import os 
from flatten_json import flatten_json 

def createcolumnheadings(cols): 
    #create column headings 
    columns = cols.keys() 
    columns = list(set(columns)) 
    return columns 

doOnce=True 

path=os.chdir(sys.argv[1]) 

for f in glob.glob("smallR.txt"): 
    fName=os.path.splitext(f)[0] 
    out_file= open('csv/' + fName+'.csv', 'wb') 
    csv_w = csv.writer(out_file, delimiter="\t", encoding='utf-8' ) 

    with open(f, 'r') as handle: 
     for line in handle: 
      data = json.loads(line)   
      flatdata =flatten_json(data)    
      if doOnce: 
       columns=createcolumnheadings(flatdata) 
       columns.insert(0,'racism') 
       csv_w.writerow(columns)     
       doOnce=False 
      flatdata['racism']= 0 
      csv_w.writerow(flatdata.get(x, u'') for x in columns)

該工程確定，有一個問題。我的程序只需要從smallR.txt的第一行開始的列標題（加上它添加了「種族主義」列）。

後面的一些數據（smallR.txt here）有不同的列。這導致輸出不太正確，請參閱small.csv here。

是否有一種簡單的方法來適應我的程序，以處理後續行中找到的新列標題？

來源

2016-10-28 schoon

在您需要先掃描整個文件的情況下，爲了得到所有可能的列：

with open(f, 'r') as handle: 
    data = [json.loads(line) for line in handle] 

columns = ['racism'] + list({k for entry in data for k in entry.keys()}) 

csv_w.writerow(columns) 
for entry in entries: 
    csv_w.writerow(entry.get(c, '') for c in columns)

這加載所有的數據在內存中。如果這是不能接受的話，你可能會讀文件兩次：一個拿到列，另一個讀寫：

with open(f, 'r') as handle: 
    columns = ['racism'] + list({k for line in handle for k in json.load(line).keys()}) 
csv_w.write(columns) 

with open(f, 'r') as handle: 
    for line in handle: 
     entry = json.loads(line) 
     csv_w.write(entry.get(c, '') for c in columns)

的flatten_json函數定義丟失，所以我只能猜測它做什麼。

來源

2016-10-28 14:00:28 Javier

感謝哈維爾，檔案非常龐大，所以我會給你第二種方法。 Flatten_json是[從這裏]導入的（https://medium.com/@amirziai/flattening-json-objects-in-python-f5343c794b10#.v8fb0z7bt） – schoon

添加列標題使用python csv文件，讀取JSON

回答

相關問題