使用Pandas讀取子水平數據時,我卡住了。使用Pandas讀取子級JSON數據
背景:
我用NYT存檔API下載一系列數據,我保存它實際上有它JSON對象列表的JSON文件。
步驟:
我使用read_json方法讀取的JSON文件。
pandas_df = pd.read_json("data.json")
當我用頭看樣的結果,它看起來像如下:
pandas_df.head()
copyright \
0 Copyright (c) 2013 The New York Times Company....
1 Copyright (c) 2013 The New York Times Company....
2 Copyright (c) 2013 The New York Times Company....
3 Copyright (c) 2013 The New York Times Company....
4 Copyright (c) 2013 The New York Times Company....
response
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
我只需要在響應信息。所以,當我改變像下面的代碼:
print(pandas_df["response"].head())
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
Name: response, dtype: object
問:
我如何可以獲取使用內部文檔元素的數據?像小節,幻燈片等我可以看到它在表格格式,如數據框?
如果需要更多信息,請讓我知道。
謝謝。
EDIT 1:
從JSON文件添加第一個元素。這個文件在1GB左右太大了。
{
"copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.",
"response": {
"meta": {
"hits": 7652
},
"docs": [
{
"web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html",
"snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"height": 346,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"xlargeheight": "346"
},
"type": "image"
},
{
"width": 75,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits",
"kicker": "Tech Fix"
},
"keywords": [
{
"rank": "1",
"is_major": "N",
"name": "subject",
"value": "Video Recordings, Downloads and Streaming"
},
{
"rank": "2",
"is_major": "N",
"name": "subject",
"value": "Television Sets and Media Devices"
},
{
"rank": "1",
"is_major": "Y",
"name": "subject",
"value": "Television"
}
],
"pub_date": "2016-01-01T05:00:00Z",
"document_type": "multimedia",
"news_desk": "Technology/Personal Tech",
"section_name": "Technology",
"subsection_name": "Personal Tech",
"byline": {
"person": [
{
"firstname": "Brian",
"middlename": "X.",
"lastname": "CHEN",
"rank": 1,
"role": "reported",
"organization": ""
}
],
"original": "By BRIAN X. CHEN"
},
"type_of_material": "Interactive Feature",
"_id": "57fdfb9895d0e022439c2b57",
"word_count": null,
"slideshow_credits": null
}]}}
您可以發佈前幾行的整個原始JSON嗎? –
補充,請看看。 –
我想讀「文檔」 –