使用scrapy抓取動態數據

我嘗試從target.com獲取產品評分信息。本產品的URL是使用scrapy抓取動態數據

http://www.target.com/p/bounty-select-a-size-paper-towels-white-8-huge-rolls/-/A-15258543#prodSlot=medium_1_4&term=bounty

通過response.body尋找後，我發現，評級信息不被靜態加載。所以我需要使用其他方式。我發現一些類似的問題，以獲得動態的數據說，我需要

找出正確XHR並在發送請求
使用FormRequest得到正確的JSON
解析JSON （如果我錯了步驟請告訴我）

我卡在第2步現在，我發現一個名爲15258543 XHR包含評級分配，但我不知道我怎麼能發送請求得到json。喜歡在哪裏使用什麼參數。

有人可以走過我嗎？謝謝！

來源

2016-02-10 user2628641

最棘手的是動態獲取15258543產品ID，然後在URL中使用它來獲得評論。該產品ID可以在產品頁面上的多個地方找到，比如，有一個我們可以使用一個meta元素：

<meta itemprop="productID" content="15258543">

這裏是一個工作的蜘蛛，使一個單獨的GET請求來獲得的評論，通過加載的json.loads() JSON響應，並打印產品的整體評價：

import json 

import scrapy 

class TargetSpider(scrapy.Spider): 
    name = "target" 
    allowed_domains = ["target.com"] 
    start_urls = ["http://www.target.com/p/bounty-select-a-size-paper-towels-white-8-huge-rolls/-/A-15258543#prodSlot=medium_1_4&term=bounty"] 

    def parse(self, response): 
     product_id = response.xpath("//meta[@itemprop='productID']/@content").extract_first() 

     return scrapy.Request("http://tws.target.com/productservice/services/reviews/v1/reviewstats/" + product_id, 
           callback=self.parse_ratings, 
           meta={"product_id": product_id}) 

    def parse_ratings(self, response): 
     data = json.loads(response.body) 

     print(data["result"][response.meta["product_id"]]["coreStats"]["AverageOverallRating"])

打印4.5585。

來源

2016-02-10 22:23:28 alecxe

我明白了，謝謝！只是一個後續問題。我注意到有另一個XHR獲取商店ID的請求，它被命名爲「v1？request_type = availability＆key = ......」，所以我嘗試使用相同的方式來獲取該json文件，但返回消息說「請求方法」GET'不支持「。我的問題是，我可以清楚地看到在Chrome開發者工具中返回的json，所以肯定有辦法得到它，我只是不知道如何。你能給我一個提示嗎？ – user2628641

使用scrapy抓取動態數據

回答

相關問題