2017-07-17 19 views
1

我有一個包含嵌套JSON對象的多個數如下所示的數據集:提取從嵌套JSON對象數據以特定的格式在Python

{ 
"coordinates": null, 
"acoustic_features": { 
    "instrumentalness": "0.00479", 
    "liveness": "0.18", 
    "speechiness": "0.0294", 
    "danceability": "0.634", 
    "valence": "0.342", 
    "loudness": "-8.345", 
    "tempo": "125.044", 
    "acousticness": "0.00035", 
    "energy": "0.697", 
    "mode": "1", 
    "key": "6" 
}, 
"artist_id": "b2980c722a1ace7a30303718ce5491d8", 
"place": null, 
"geo": null, 
"tweet_lang": "en", 
"source": "Share.Radionomy.com", 
"track_title": "8eeZ", 
"track_id": "cd52b3e5b51da29e5893dba82a418a4b", 
"artist_name": "Dominion", 
"entities": { 
    "hashtags": [{ 
     "text": "nowplaying", 
     "indices": [0, 11] 
    }, { 
     "text": "goth", 
     "indices": [51, 56] 
    }, { 
     "text": "deathrock", 
     "indices": [57, 67] 
    }, { 
     "text": "postpunk", 
     "indices": [68, 77] 
    }], 
    "symbols": [], 
    "user_mentions": [], 
    "urls": [{ 
     "indices": [28, 50], 
     "expanded_url": "cathedral13.com/blog13", 
     "display_url": "cathedral13.com/blog13", 
     "url": "t.co/Tatf4hEVkv" 
    }] 
}, 
"created_at": "2014-01-01 05:54:21", 
"text": "#nowplaying Dominion - 8eeZ Tatf4hEVkv #goth #deathrock #postpunk", 
"user": { 
    "location": "middle of nowhere", 
    "lang": "en", 
    "time_zone": "Central Time (US & Canada)", 
    "name": "Cathedral 13", 
    "entities": null, 
    "id": 81496937, 
    "description": "I\u2019m a music junkie who is currently responsible for 
Cathedral 13 internet radio (goth, deathrock, post-punk)which has been online 
since 06/20/02." 
}, 
"id": 418243774842929150 
} 

我要輸出文件,看起來具有格式:

user_id1 - track_id - hashtag1 
user_id1 - track_id - hashtag2 
user_id1 - track_id - hashtag3 
user_id2 - track_id - hashtag1 
user_id2 - track_id - hashtag2 
.... 

就是這個例子的輸出應該是:

81496937 cd52b3e5b51da29e5893dba82a418a4b nowplaying 
81496937 cd52b3e5b51da29e5893dba82a418a4b goth 
81496937 cd52b3e5b51da29e5893dba82a418a4b deathrock 
81496937 cd52b3e5b51da29e5893dba82a418a4b postpunk 

我寫的以下代碼可以做到這一點:

import json 
import csv 
with open('final_dataset_json.json') as data_file: 
     data = json.load(data_file) 

uth = open('uth.csv','wb') 

cvwriter = csv.writer(uth) 

for entry in data: 
    text_list = [hashtag['text'] for hashtag in entry['entities']['hashtags']] 
    for line in text_list: 
     csvwriter.writerow([entry['user']['id'],entry['track_id'],line.strip()+'\n') 

uth.close() 

如何才能實現給定的輸出?

+0

你還沒有說你與你的代碼有什麼問題(S)。 –

回答

1

在csvwriter中,如果要寫入新行,必須將所有列數據發送到列表中。

我希望如果你替換這條線就足夠了。

csvwriter.writerow([entry['user']['id'],entry['track_id'],line.strip()]) 
+0

我得到以下錯誤,我不明白爲什麼。標識沒有問題:csvwriter.writerow([entry ['user'] ['id'],entry ['track_id'],line.strip()]) NameError:name'csvwriter'未定義 –

+0

Did你在你的代碼中導入它? – BoboDarph

+0

@AsmitaPoddar在你的代碼中它是cvwriter.writerow([entry ['user'] ['id'],entry ['track_id'],line.strip()])其中cvwriter將指定你要寫入哪個文件數據 –

1

簡單的字典查找(JSON有一個模塊)

import json 
d = json.loads(json_str) 
for ht in d['entities']['hashtags']: 
    print '{} - {} - {}'.format(d['user']['id'], d['artist_id'], ht['text']) 

Yeilds:

81496937 - b2980c722a1ace7a30303718ce5491d8 - nowplaying 
81496937 - b2980c722a1ace7a30303718ce5491d8 - goth 
81496937 - b2980c722a1ace7a30303718ce5491d8 - deathrock 
81496937 - b2980c722a1ace7a30303718ce5491d8 - postpunk 
+0

我想將其存儲在csv文件中。我有多個json對象,我想這樣做。 –