2017-03-31 88 views
0

使用Pandas讀取子水平數據時,我卡住了。使用Pandas讀取子級JSON數據

背景:

我用NYT存檔API下載一系列數據,我保存它實際上有它JSON對象列表的JSON文件。

步驟:

我使用read_json方法讀取的JSON文件。

pandas_df = pd.read_json("data.json")

當我用頭看樣的結果,它看起來像如下:

pandas_df.head() 
    copyright \ 
0 Copyright (c) 2013 The New York Times Company.... 
1 Copyright (c) 2013 The New York Times Company.... 
2 Copyright (c) 2013 The New York Times Company.... 
3 Copyright (c) 2013 The New York Times Company.... 
4 Copyright (c) 2013 The New York Times Company.... 

              response 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow... 

我只需要在響應信息。所以,當我改變像下面的代碼:

print(pandas_df["response"].head()) 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow... 
Name: response, dtype: object 

問:

我如何可以獲取使用內部文檔元素的數據?像小節,幻燈片等我可以看到它在表格格式,如數據框?

如果需要更多信息,請讓我知道。

謝謝。

EDIT 1:

從JSON文件添加第一個元素。這個文件在1GB左右太大了。

{ 
    "copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.", 
    "response": { 
    "meta": { 
     "hits": 7652 
    }, 
    "docs": [ 
     { 
     "web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html", 
     "snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "abstract": null, 
     "print_page": null, 
     "blog": [], 
     "source": "The New York Times", 
     "multimedia": [ 
      { 
      "width": 190, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
      "height": 126, 
      "subtype": "wide", 
      "legacy": { 
       "wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
       "wideheight": "126", 
       "widewidth": "190" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 600, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
      "height": 346, 
      "subtype": "xlarge", 
      "legacy": { 
       "xlargewidth": "600", 
       "xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
       "xlargeheight": "346" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 75, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
      "height": 75, 
      "subtype": "thumbnail", 
      "legacy": { 
       "thumbnailheight": "75", 
       "thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
       "thumbnailwidth": "75" 
      }, 
      "type": "image" 
      } 
     ], 
     "headline": { 
      "main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits", 
      "kicker": "Tech Fix" 
     }, 
     "keywords": [ 
      { 
      "rank": "1", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Video Recordings, Downloads and Streaming" 
      }, 
      { 
      "rank": "2", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Television Sets and Media Devices" 
      }, 
      { 
      "rank": "1", 
      "is_major": "Y", 
      "name": "subject", 
      "value": "Television" 
      } 
     ], 
     "pub_date": "2016-01-01T05:00:00Z", 
     "document_type": "multimedia", 
     "news_desk": "Technology/Personal Tech", 
     "section_name": "Technology", 
     "subsection_name": "Personal Tech", 
     "byline": { 
      "person": [ 
      { 
       "firstname": "Brian", 
       "middlename": "X.", 
       "lastname": "CHEN", 
       "rank": 1, 
       "role": "reported", 
       "organization": "" 
      } 
      ], 
      "original": "By BRIAN X. CHEN" 
     }, 
     "type_of_material": "Interactive Feature", 
     "_id": "57fdfb9895d0e022439c2b57", 
     "word_count": null, 
     "slideshow_credits": null 
     }]}} 
+1

您可以發佈前幾行的整個原始JSON嗎? –

+0

補充,請看看。 –

+0

我想讀「文檔」 –

回答

0

你應該能夠提取所有在其下嵌套在response字典成數據幀的docs列表中的元素。

import json 
with open('data.json') as f: 
    data = json.load(f) 
df = pd.DataFrame(data['response']['docs']) 
+0

最後一行是給我的錯誤中大多值:類型錯誤:列表索引必須是整數或片,而不是STR 你知道爲什麼是這樣呢? 這是因爲我正在讀取一個包含多個JSON對象的文件嗎? –

+0

我通過添加一個閉括號和兩個閉合的大括號來修改了json輸入。將確切的json直接複製到文件中,然後再次運行我的代碼。它應該工作。 –