JSON響應和Scrapy

我想解析從Scrapy到紐約時報API的JSON響應到CSV，以便我可以有一個特定查詢的所有相關文章的摘要。我想以鏈接，發佈日期，摘要和標題的形式將此信息吐出，以便我可以在摘要說明中運行一些關鍵字搜索。我對Python和Scrapy都很陌生，但這裏是我的蜘蛛（我得到一個HTTP 400錯誤）。我xx'ed了我的API密鑰蜘蛛：JSON響應和Scrapy

from scrapy.spider import BaseSpider 
from scrapy.selector import HtmlXPathSelector 
from nytimesAPIjson.items import NytimesapijsonItem 
import json 
import urllib2 

class MySpider(BaseSpider): 
    name = "nytimesapijson" 
    allowed_domains = ["http://api.nytimes.com/svc/search/v2/articlesearch"] 
    req = urllib2.urlopen('http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx) 

     def json_parse(self, response): 
      jsonresponse= json.loads(response) 

      item = NytimesapijsonItem() 
      item ["pubDate"] = jsonresponse["pub_date"] 
      item ["description"] = jsonresponse["lead_paragraph"] 
      item ["title"] = jsonresponse["print_headline"] 
      item ["link"] = jsonresponse["web_url"] 
      items.append(item) 
      return items

如果任何人有任何想法/建議，包括onese Scrapy之外，請讓我知道。提前致謝。

來源

2013-09-16 eroma934

400意味着，在URL中的GET請求的格式不正確。也許嘗試sort = newest，而不是rank = newest。 http://developer.nytimes.com/docs/read/article_search_api_v2 – umeboshi

仍然有相同的問題，但感謝您的建議 – eroma934

我實際上已經能夠修復HTTP錯誤，但現在CSV文件是空的。 – eroma934

您應該設置start_urls和使用parse方法：

from scrapy.spider import BaseSpider 
import json 


class MySpider(BaseSpider): 
    name = "nytimesapijson" 
    allowed_domains = ["api.nytimes.com"] 
    start_urls = ['http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx'] 

    def parse(self, response): 
     jsonresponse = json.loads(response) 

     print jsonresponse

來源

2013-09-16 19:23:44 alecxe

非常感謝，可以在屏幕上打印。現在有什麼辦法將其映射到項目並轉換爲CSV。 – eroma934

@ eroma934是的，你實際上已經在代碼中。我只是省略了這個例子。 – alecxe

唯一的問題是，當我保持項目語法時，出現GET錯誤。這是因爲我錯過了一個循環（我認爲），但我不確定hxs.select的json等效項會在典型的蜘蛛。我真的很感激你的意見。 – eroma934

JSON響應和Scrapy

回答

相關問題