2017-04-04 60 views
-1

我想獲得一些應用程序的詳細信息與他們的評論。問題是我不知道如何將每個應用程序的細節與其評論相關聯。下面是一個代碼示例:Scrapy:抓取一個谷歌播放應用程序頁與評論

def parse(self, response): 
    l = ItemLoader(item=GamesScraperItem(), response=response) 
    #Get the details of the app here 
    #... 

    url = "https://play.google.com/store/getreviews" 
    #...   
    for num in range(111): 
     form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '4', "pageNum": str(num),"xhr": '1'} 
     sleep(5) 
     yield FormRequest(url=url,headers=headers_data, formdata=form_data,callback=self.parse_reviews) 

def parse_reviews(self, response):  
    response_data = re.findall("\[\[.*", response.body) 
    if response_data: 
     try: 
      text = json.loads(response_data[0] + ']')    
      sell = Selector(text=text[0][2]) 
     except: 
      pass 
     #Get a list of reviews data 
     #... 

我想找到一種方法來加入一個列表中的所有評論,然後將其添加到應用程序的細節。

謝謝。

+0

你不應該使用'time.sleep' scrapy是asynchronious,它只有幾個街區的一切,嘗試['download_delay'設置](https://doc.scrapy.org/en/latest/topics/settings的.html#下載延遲) – Granitosaurus

回答

0

您可以在Request.meta屬性中攜帶您的物品。

def parse(self, response): 
    l = ItemLoader(item=GamesScraperItem(), response=response) 
    #Get the details of the app here 
    url = "https://play.google.com/store/getreviews" 
    form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '4', "pageNum": "1","xhr": '1'} 
    yield FormRequest(url=url, 
         headers=headers_data, 
         formdata=form_data, 
         callback=self.parse_reviews 
         meta={'item': l.load_item()}) # <--- 

def parse_reviews(self, response):  
    item = response.meta['item'] # <--- 
    l = ItemLoader(item=item, response=response) 
    # add more stuff to the loader, it will have everything that was added in parse method 
    # ... 
    # do page 2 the same way you did page 1 in parse method 
    form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '4', "pageNum": "2","xhr": '1'} 
相關問題