2014-02-26 63 views
0

我有一個文件與以下行從?轉換爲csv?

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]} 

我想每一行轉變成與一個頭CSV可讀的文件。如下所示

status,message,data,addressAccessId,municipalityCode,municipalityName,streetCode,streetName,streetBuildingIdentifier,mailDeliverySublocationIdentifier,districtSubDivisionIdentifier,postCodeIdentifier,districtName,presentationString,addressSpecificCount,validCoordinates,geometryWkt,x,y 
OK,OK,data:type,addressAccessType,0a3f508f-e7c8-32b8-e044-0003ba298018,0766,Hedensted,0072,Værnegården,13,,,8000,Århus,Værnegården 13, 8000 Århus,1,true,POINT553564 6179299,553564,6179299 

我該如何做到這一點?代碼和解釋是非常受歡迎的。到目前爲止,這是我想出了從這個例子:(How can I convert JSON to CSV?以下)

x = json.loads(x) 

f = csv.writer(open('test.csv', 'wb+')) 

# Write CSV Header, If you dont need that, remove this line 
f.writerow(['status', 'message', 'type', 'addressAccessId', 'municipalityCode','municipalityName','streetCode','streetName','streetBuildingIdentifier','mailDeliverySublocationIdentifier','districtSubDivisionIdentifier','postCodeIdentifier','districtName','presentationString','addressSpecificCount','validCoordinates','geometryWkt','x','y']) 


for x in x: 
    f.writerow([x['status'], 
       x['message'], 
       x['data']['type'], 
       x['data']['addressAccessId'], 
       x['data']['municipalityCode'], 
       x['data']['municipalityName'], 
       x['data']['streetCode'], 
       x['data']['streetName'], 
       x['data']['streetBuildingIdentifier'], 
       x['data']['mailDeliverySublocationIdentifier'], 
       x['data']['districtSubDivisionIdentifier'], 
       x['data']['postCodeIdentifier'], 
       x['data']['districtName'], 
       x['data']['presentationString'], 
       x['data']['addressSpecificCount'], 
       x['data']['validCoordinates'], 
       x['data']['geometryWkt'], 
       x['data']['x'], 
       x['data']['y']]) 

我已經通過看和嘗試了很多其他的解決方案,包括DictWriter的,更換()和翻譯()刪除但是還沒有能夠改變我的需求。目的是能夠選擇輸出到新文件中的字段,並將x和y轉換爲新的座標系。但現在我只是試圖解析上面的行到一個CSV文件。任何人都可以提供他們的代碼的代碼和解釋?非常感謝您的寶貴時間。

下面是我addresses.txt

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5081-e039-32b8-e044-0003ba298018","municipalityCode":"0265","municipalityName":"Roskilde","streetCode":"0831","streetName":"Brønsager","streetBuildingIdentifier":"69","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Svogerslev","postCodeIdentifier":"4000","districtName":"Roskilde","presentationString":"Brønsager 69, 4000 Roskilde","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(690026 6169309)","x":690026,"y":6169309}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5089-ecab-32b8-e044-0003ba298018","municipalityCode":"0461","municipalityName":"Odense","streetCode":"9505","streetName":"Vægtens Kvarter","streetBuildingIdentifier":"271","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Holluf Pile","postCodeIdentifier":"5220","districtName":"Odense SØ","presentationString":"Vægtens Kvarter 271, 5220 Odense SØ","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(592191 6135829)","x":592191,"y":6135829}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f507c-adc3-32b8-e044-0003ba298018","municipalityCode":"0165","municipalityName":"Albertslund","streetCode":"0445","streetName":"Skyttehusene","streetBuildingIdentifier":"33","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"2620","districtName":"Albertslund","presentationString":"Skyttehusene 33, 2620 Albertslund","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(711079 6174741)","x":711079,"y":6174741}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f509c-7f57-32b8-e044-0003ba298018","municipalityCode":"0851","municipalityName":"Aalborg","streetCode":"5205","streetName":"Løvstikkevej","streetBuildingIdentifier":"36","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"9000","districtName":"Aalborg","presentationString":"Løvstikkevej 36, 9000 Aalborg","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(552407 6322490)","x":552407,"y":6322490}]} 
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5098-32a6-32b8-e044-0003ba298018","municipalityCode":"0779","municipalityName":"Skive","streetCode":"0462","streetName":"Landevejen","streetBuildingIdentifier":"52","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Håsum","postCodeIdentifier":"7860","districtName":"Spøttrup","presentationString":"Landevejen 52, 7860 Spøttrup","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(491515 6269739)","x":491515,"y":6269739}]} 
+1

當你的列表也被稱爲'x'時,我不會重複迭代變量'x'。 –

回答

3

注意的前幾行的data關鍵持有字典的列表x['data']['type']將不起作用,但x['data'][0]['type']。但是,該列表中可能有多個這樣的字典。我假設你想要一個CSV行x['data']字典

接下來,看起來您在每行上都有一個UTF-8 BOM ;無論寫什麼,都沒有正確使用UTF-8編碼。我們需要去掉這個標記,前3個字符。

最後,JSON字符串始終是Unicode數據,並且數據中包含非ASCII字符,因此在將數據傳遞給CSV對象之前,必須再次編碼爲字節串。

我會用csv.DictWriter這裏,有一個字段名稱預先定義的列表:

import codecs 
import csv 
import json 

fields = [ 
    'status', 'message', 'type', 'addressAccessId', 'municipalityCode', 
    'municipalityName', 'streetCode', 'streetName', 'streetBuildingIdentifier', 
    'mailDeliverySublocationIdentifier', 'districtSubDivisionIdentifier', 
    'postCodeIdentifier', 'districtName', 'presentationString', 'addressSpecificCount', 
    'validCoordinates', 'geometryWkt', 'x', 'y'] 


with open('test.csv', 'wb') as csvfile, open('jsonfile', 'r') as jsonfile: 
    writer = csv.DictWriter(csvfile, fields) 
    writer.writeheader() 

    for line in jsonfile: 
     if line.startswith(codecs.BOM_UTF8): 
      line = line[3:] 
     entry = json.loads(line) 
     for item in entry['data']: 
      row = dict(item, status=entry['status'], message=entry['message']) 
      row = {k.encode('utf8'): unicode(v).encode('utf8') for k, v in row.iteritems()} 
      writer.writerow(row) 

row字典基本上是每個在entry['data']列表字典的副本,與statusmessage密鑰分別複製。這使得row是一個平面字典。

我也一行一行讀取你的輸入文件,就像你說每行包含一個單獨的JSON條目一樣。

+0

你想把'writer.writerow(row)'放在'for'循環中嗎? – colcarroll

+0

非常感謝您的詳細解答,它絕對有很大的幫助。假設我有一個包含多行的文件,我想要的數據在'x [data]'中。然而,當我嘗試你的代碼時,我得到以下錯誤:ValueError:沒有JSON對象可以被解碼是因爲包含我的json-lines的文件,或者它可能是因爲行是無效的json? – Philip

+0

@JLLagrange:的確如此。 –

0

使用cvs.DictWriter()打開輸出文件並按照您的指定定義輸出標題字段。使用extrasaction ='ignore'和restval =''作爲選項。

看看Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6幫助處理大文件,因爲我有一個類似的問題也看看我鏈接到的問題。

我使用適當的循環從JSON構建類似類型的系統。

例如,

def parse_row(currdata): 
    outx = {} 
    # currdata is defined earlier to point to the x['data'] dictionary 
    for eachx in currdata: 
    outx[eachx] = currdata[eachx] 
    return outx 

其中這與currdata作爲自變量的函數,並要求具有x [「數據」] [行]作爲輸入參數。

rows = len(x['data']) 
for row in range(rows): 
    outx = parse_row(x['data'][row]) 
    # process the row and create output 

這應該讓你正確設置解析。我不能將實際的代碼複製到這個答案中,但這應該指向一個解決方案。