2013-03-30 65 views
0

有人送我這個代碼從json轉換爲csv格式。從json2cvs格式轉換時出錯

下面是json2csv的代碼。

import sys, json, csv 

input = open(sys.argv[1]) 
json_array = json.load(input) 
input.close() 

item_data = json_array 
if len(item_data) >= 1: 
    first_item_id = item_data[0]['item_id'] 
    columns = item_data[0].keys() 

csv_file = open(sys.argv[2], "wb") 
writer = csv.writer(csv_file) 
# there is currently a known bug where column names are partially uppercase, this will be fixed soon. the "map(lambda x: x.lower(), columns)" fixes this issue in the mean time 
writer.writerow(map(lambda x: x.lower(), columns)) 

# here .items() is a standard python function 
for item in item_data: 
    row = [] 
    for column_name in columns: 
     if column_name.lower() == 'name_part': # lower required due to above issue 
      row.append(" ".join(item[column_name])) 
     else: 
      row.append(item[column_name]) 
    writer.writerow(row) 

這裏是我的JSON data.Which我保存爲transaction.json

{"comment": "Developer test ", "invoice_intern_external_ids": "", "invoice_payments": [{"payment_id": 8, "payment_method": "Refund", "timestamp": "2013-03-05", "invoice_id": 12, "writeoff_reason": "", "payment": 160.0}, {"payment_id": 9, "payment_method": "Cash", "timestamp": "2013-03-05", "invoice_id": 12, "writeoff_reason": "", "payment": 160.0}], "tax": 0.0, "pay_to_external_id": -1, "total": 0.0, "pay_to_contact_id": 13, "client_external_id": 11, "is_draft": false, "invoice_clinician_external_id": 999925, "location": "Therapy A", "invoice_clinician_id": 7, "bill_to_external_id": 11, "timestamp": "2013-03-05", "client_contact_id": 16, "subtotal": 0.0, "invoice_id": 26, "write_off": 0.0, "invoice_items": [{"item_tax": 0.0, "item_name": "InitialVisit_O", "timestamp": "2013-03-05", "item_unit_price": 160.0, "tax": 0.0, "invoice_item_id": 21, "invoice_instance_id": 26, "total": 0.0, "subtotal": 0.0, "item_description": "Initial Assessment/hour", "quantity": 0.0}], "billing_date": "2013-03-05", "invoice_intern_ids": "[]", "bill_to_contact_id": 16, "balance": 0.0, "invoice_instance_id": 12} 
{"comment": "", "invoice_intern_external_ids": null, "invoice_payments": [], "tax": 0.0, "pay_to_external_id": -1, "total": 260.0, "pay_to_contact_id": 13, "client_external_id": -1, "is_draft": false, "invoice_clinician_external_id": null, "location": "Sports Medicine", "invoice_clinician_id": 7, "bill_to_external_id": -1, "timestamp": "2013-02-25", "client_contact_id": 15, "subtotal": 260.0, "invoice_id": 23, "write_off": 0.0, "invoice_items": [{"item_tax": 0.0, "item_name": "CompAsses", "timestamp": "2013-02-25", "item_unit_price": 260.0, "tax": 0.0, "invoice_item_id": 36, "invoice_instance_id": 23, "total": 260.0, "subtotal": 260.0, "item_description": "Comp Assess Report", "quantity": 1.0}], "billing_date": "2013-02-22", "invoice_intern_ids": "[]", "bill_to_contact_id": 15, "balance": 260.0, "invoice_instance_id": 10} 

我試圖做c:\python.exe c:\json2csv.py c:\transaction.json c:\transaction.txt 我得到了錯誤

Extra data line2 column 1 - line 12 column1 (char 1105 - char 11267) 

如果有人能更正代碼來獲取那麼所有的領域都很棒。 我甚至不需要csv中的所有字段。我只需要client_external_id, invoice_clinician_id, invoice_id, location,,item_unit_price,item_description, quantity,billing_date

這已經很長時間了,我需要今天完成這個。請幫助。

+0

您的輸入包含* multiple * JSON條目,而代碼只能處理單個JSON數據結構。也許你需要閱讀你的輸入文件*每行*? –

+0

我該怎麼做? –

+0

您發送的代碼與您的JSON文件中的數據結構*完全不匹配*。例如,沒有'item_id'鍵,也沒有'name_part'。你的JSON結構也是* nested *,它不能很好地轉換成CSV。 –

回答

1

這裏有多種問題:

  1. 你的JSON數據實際上是多個JSON DATAS。如果您有大量數據,這將很難解決,儘管Martijns建議每行讀取數據可能會有所幫助,假設數據真的是每行一個JSON映射。否則需要將數據是固定的,是這樣的:

    [{"comment": "Developer test ", "invoice_intern_external_ids": "" ...}, 
    {"comment": "", "invoice_intern_external_ids": null, ...}] 
    

    注意每個JSON後的開閉括號,並且逗號{}(除了最後一個)。

  2. 你給出的腳本不是特別通用的。它假定在​​給定的第一個JSON對象中有一個'item_id',但沒有。雖然這是可以修復的。

  3. 您的invoice_payments數據是詞典的列表。這意味着你的數據是分層的。你想如何轉換爲CVS,這只是一個平坦的數據列表?這並不明顯。你顯示的腳本不涉及這個,它是通用的,並假設你的json數據是平坦的。

一個固定的轉換器:

import sys, json, csv 

input = open(sys.argv[1]) 
json_array = [] 
for data in input.readlines(): 
    json_array.append(json.loads(data)) 
input.close() 

item_data = json_array 
if len(item_data) >= 1: 
    columns = item_data[0].keys() 

csv_file = open(sys.argv[2], "wb") 
writer = csv.writer(csv_file) 
# there is currently a known bug where column names are partially uppercase, this will be fixed soon. the "map(lambda x: x.lower(), columns)" fixes this issue in the mean time 
writer.writerow(map(lambda x: x.lower(), columns)) 

# here .items() is a standard python function 
for item in item_data: 
    row = [] 
    for column_name in columns: 
     if column_name.lower() == 'name_part': # lower required due to above issue 
      row.append(" ".join(item[column_name])) 
     else: 
      row.append(item[column_name]) 
    writer.writerow(row) 

產生這一結果爲CSV:

comment,invoice_intern_external_ids,invoice_payments,tax,pay_to_external_id,total,pay_to_contact_id,client_external_id,is_draft,invoice_clinician_external_id,location,invoice_instance_id,invoice_clinician_id,bill_to_external_id,timestamp,client_contact_id,subtotal,invoice_id,write_off,invoice_items,invoice_intern_ids,bill_to_contact_id,balance,billing_date 
Developer test ,,"[{u'payment_id': 8, u'payment_method': u'Refund', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}, {u'payment_id': 9, u'payment_method': u'Cash', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}]",0.0,-1,0.0,13,11,False,999925,Therapy A,12,7,11,2013-03-05,16,0.0,26,0.0,"[{u'item_tax': 0.0, u'item_name': u'InitialVisit_O', u'timestamp': u'2013-03-05', u'item_unit_price': 160.0, u'tax': 0.0, u'subtotal': 0.0, u'invoice_item_id': 21, u'total': 0.0, u'invoice_instance_id': 26, u'item_description': u'Initial Assessment/hour', u'quantity': 0.0}]",[],16,0.0,2013-03-05 
,,[],0.0,-1,260.0,13,-1,False,,Sports Medicine,10,7,-1,2013-02-25,15,260.0,23,0.0,"[{u'item_tax': 0.0, u'item_name': u'CompAsses', u'timestamp': u'2013-02-25', u'item_unit_price': 260.0, u'tax': 0.0, u'subtotal': 260.0, u'invoice_item_id': 36, u'total': 260.0, u'invoice_instance_id': 23, u'item_description': u'Comp Assess Report', u'quantity': 1.0}]",[],15,260.0,2013-02-22 

注意您invoice_payments數據是如何被轉換爲字符串:

"[{u'payment_id': 8, u'payment_method': u'Refund', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}, {u'payment_id': 9, u'payment_method': u'Cash', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}]",0.0,-1,0.0,13,11,False,999925,Therapy A,12,7,11,2013-03-05,16,0.0,26,0.0,"[{u'item_tax': 0.0, u'item_name': u'InitialVisit_O', u'timestamp': u'2013-03-05', u'item_unit_price': 160.0, u'tax': 0.0, u'subtotal': 0.0, u'invoice_item_id': 21, u'total': 0.0, u'invoice_instance_id': 26, u'item_description': u'Initial Assessment/hour', u'quantity': 0.0}]" 

導入CSV的任何內容都不會成立這是任何實際的意義。您的JSON數據不能簡單地轉換爲CSV,您必須決定並指定CSV數據的外觀。

+0

整個想法是將csv數據導入到sql server中。是否有辦法將這些json對象作爲字段直接存儲在sql表中? –

+0

@AnithaYedavalli:是的,在這種情況下,通過CSV進行操作可能沒有用處。作爲第一步,您仍然需要在JSON數據和SQL數據之間進行映射。之後,製作一個導入腳本是相當平凡的。 –