2014-11-06 36 views
1

我將一大段文本解析爲字典,最終目標是創建一個CSV文件並將其作爲列標題。當事先不知道字段時,使用DictWriter寫入CSV

csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

問題出現的字典任何「n行可以包括一個新的,用前所未有的關鍵。然後我希望CSV也包含這個新密鑰的列。總之,我所有的領域都不是事先知道的,所以我不能在開頭編譯完整的fieldnames

是否有推薦的方法讓csv.DictWriter不忽略丟失的字段,而是將它們添加到fieldnames而不是?在這一點上僅僅改變fieldnames會使字段的數量不正確。

+0

能否請您提供一個樣本字典結構。 – 2014-11-06 06:37:43

+0

問題是在代碼執行之前字典密鑰是未知的,但我希望能夠從列表的字典中編寫CSV。我正在編譯整個列表的字典,然後迭代鍵來識別可用於字段名的唯一鍵。然而,隨着數據集的增長,我希望能夠在我知道所有的字典之前編寫一個CSV。 – Pranab 2014-11-06 08:41:43

+0

Pranab請在下面查看我的答案。 – 2014-11-06 15:26:35

回答

2

而不是使用DictWriter它可以在你的情況下,混亂的字典是沒有順序的我嘗試使用的writerow CSV方法。 這裏是我做的:

""" 
a) First took all the keys of dictionary and sorted it, which is not necessary. 
b) Created a result list which appends value related the headers which is key of our input dict and if key is not available then .get() will return None. 
    So result list will contain lists for rows data. 
c) Wrote header and each row from result list in csv file 
""" 

data_dict = [{ "Header_1":"data_1", "Header_2":"data_2", "Header_3":"data_3"}, 
      { "Header_1":"data_4", "Header_2":"data_5", "Header_3":"data_6"}, 
      { "Header_1":"data_7", "Header_2":"data_8", "Header_3":"data_9", "Header_4":"data_10"}, 
      { "Header_1":"data_11", "Header_3":"data_12"}, 
      { "Header_1":"data_13", "Header_2":"data_14", "Header_3":"data_15"}] 

""" 
    In the third dict we have extra key, value. 
    In forth we dont have have header_2 were we aspect blank value in our csv file. 
""" 
process_data = [ [k,v] for _dict in data_dict for k,v in _dict.iteritems() ]   

headers = [ i[0] for i in process_data ] 
headers = sorted(list(set(headers))) 

result = [] 
for _dict in data_dict: 
    row = [] 
    for header in headers: 
     row.append(_dict.get(header, None)) 
    result.append(row) 


import csv 
with open('demo.csv', 'wb') as csvfile: 
    spamwriter = csv.writer(csvfile, delimiter=';', dialect='excel', 
          quotechar='|', quoting=csv.QUOTE_MINIMAL) 
    spamwriter.writerow(headers)  
    for r in result: 
     spamwriter.writerow(r) 

enter image description here

相關問題