2017-02-22 81 views
0

我想從紐約時報(紐約時報)文章中獲取一些數據,當我執行下面的代碼時,它給了我一個我不熟悉的錯誤,我搜索了在谷歌和通過以前的答案從stackoverflow,但不明白我的問題。 任何人都可以請告訴我如何解決我的錯誤。 在此先感謝!python code error(linux,web scrapping)

的代碼:

from nytimesarticle import articleAPI 
api = articleAPI('a0de895aa110431eb2344303c7105a9f') 

articles = api.search(q = 'Obama', 
    fq = {'headline':'Obama', 'source':['Reuters','AP', 'The New York Times']}, 
    begin_date = 20111231) 

def parse_articles(articles): 
    ''' 
    This function takes in a response to the NYT api and parses 
    the articles into a list of dictionaries 
    ''' 
    news = [] 
    for i in articles['response']['docs']: 
     dic = {} 
     dic['id'] = i['_id'] 
     if i['abstract'] is not None: 
      dic['abstract'] = i['abstract'].encode("utf8") 
     dic['headline'] = i['headline']['main'].encode("utf8") 
     dic['desk'] = i['news_desk'] 
     dic['date'] = i['pub_date'][0:10] # cutting time of day. 
     dic['section'] = i['section_name'] 
     if i['snippet'] is not None: 
      dic['snippet'] = i['snippet'].encode("utf8") 
     dic['source'] = i['source'] 
     dic['type'] = i['type_of_material'] 
     dic['url'] = i['web_url'] 
     dic['word_count'] = i['word_count'] 
     # locations 
     locations = [] 
     for x in range(0,len(i['keywords'])): 
      if 'glocations' in i['keywords'][x]['name']: 
       locations.append(i['keywords'][x]['value']) 
     dic['locations'] = locations 
     # subject 
     subjects = [] 
     for x in range(0,len(i['keywords'])): 
      if 'subject' in i['keywords'][x]['name']: 
       subjects.append(i['keywords'][x]['value']) 
     dic['subjects'] = subjects 
     news.append(dic) 
    return(news) 

def get_articles(date,query): 
    ''' 
    This function accepts a year in string format (e.g.'1980') 
    and a query (e.g.'Amnesty International') and it will 
    return a list of parsed articles (in dictionaries) 
    for that year. 
    ''' 
    all_articles = [] 
    for i in range(0,100): #NYT limits pager to first 100 pages. But rarely will you find over 100 pages of results anyway. 
     articles = api.search(q = query, 
       fq = {'source':['Reuters','AP', 'The New York Times']}, 
       begin_date = date + '0101', 
       end_date = date + '1231', 
       sort='oldest', 
       page = str(i)) 
     articles = parse_articles(articles) 
     all_articles = all_articles + articles 
    return(all_articles) 

Amnesty_all = [] 
for i in range(1980,2014): 
    print 'Processing' + str(i) + '...' 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    Amnesty_all = Amnesty_all + Amnesty_year 

import csv 
keys = Amnesty_all[0].keys() 
with open('amnesty-mentions.csv', 'wb') as output_file: 
    dict_writer = csv.DictWriter(output_file, keys) 
    dict_writer.writeheader() 
    dict_writer.writerows(Amnesty_all) 

這是在終端上運行時生成的錯誤:

[email protected]:~$ cd Desktop 
[email protected]:~/Desktop$ python nyt.py 
Processing1980... 
Traceback (most recent call last): 
    File "nyt.py", line 66, in <module> 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    File "nyt.py", line 59, in get_articles 
    articles = parse_articles(articles) 
    File "nyt.py", line 14, in parse_articles 
    for i in articles['response']['docs']: 
KeyError: 'response' 
[email protected]:~/Desktop$ python nyt.py 
Processing1980... 
Traceback (most recent call last): 
    File "nyt.py", line 66, in <module> 
    Amnesty_year = get_articles(str(i),'Amnesty International') 
    File "nyt.py", line 59, in get_articles 
    articles = parse_articles(articles) 
    File "nyt.py", line 14, in parse_articles 
    for i in articles['response']['docs']: 
KeyError: 'response' 

回答

0

api.search返回不期望的結果。其代碼:

r = requests.get(url) 
    return r.json() 

所以只有當API「http://api.nytimes.com/svc/search/v2/articlesearch」返回正確的反應和響應具有正確的JSON的身體,你可以正確地得到您的代碼運行。

異常是KeyError,所以返回對象是字典像。您可能要檢查:

In [8]: print articles.keys() 
Out[8]: [u'status', u'response', u'copyright'] 

和:

In [9]: print articles['status'] 
Out[9]: u'OK' 

如果不是這樣,我想NYT API可以不填響應時的文章[ '狀態'] = 'OK',你可能需要!處理這種意外狀態並重試。

+0

謝謝!我會解決我的錯誤:) –