Scrapy - 按條目抓取多個頁面

我想抓取每個項目的一些額外頁面以獲取一些位置信息。Scrapy - 按條目抓取多個頁面

在返回前的項目結束時，我檢查是否需要抓取額外的頁面來獲取信息，實質上這些頁面包含一些位置詳細信息，並且是一個簡單的獲取請求。

I.e. http://site.com.au/MVC/Offer/GetLocationDetails/?locationId=3761&companyId=206

上面的鏈接要麼返回一個包含更多頁面的選擇，要麼包含地址詳細信息的dd/dt。無論哪種方式，我需要提取這個地址信息，並把它添加到我的項目[「位置」]

到目前爲止，我有（在分析塊的結尾）

return self.fetchLocations(locations_selector, company_id, item)

locations_selector包含locationIds

列表

然後，我有

def fetchLocations(self, locations, company_id, item): #response): 
    for location in locations: 
     if len(location)>1: 
      yield Request("http://site.com.au/MVC/Offer/GetLocationDetails/?locationId="+location+"&companyId="+company_id, 
      callback=self.parseLocation, 
       meta={'company_id': company_id, 'item': item})

最後

def parseLocation(self,response): 
    hxs = HtmlXPathSelector(response) 
    item = response.meta['item'] 

    dl = hxs.select("//dl") 
    if len(dl)>0: 
     address = hxs.select("//dl[1]/dd").extract() 
     loc = {'address':remove_entities(replace_escape_chars(replace_tags(address[0], token=' '), replace_by=''))} 
     yield loc 

    locations_select = hxs.select("//select/option/@value").extract() 
    if len(locations_select)>0: 
     yield self.fetchLocations(locations_select, response.meta['company_id'], item)

似乎無法得到這個工作....

來源

2012-06-22 AlexZ

這是你的代碼：

def parseLocation(self,response): 
    hxs = HtmlXPathSelector(response) 
    item = response.meta['item'] 

    dl = hxs.select("//dl") 
    if len(dl)>0: 
     address = hxs.select("//dl[1]/dd").extract() 
     loc = {'address':remove_entities(replace_escape_chars(replace_tags(address[0], token=' '), replace_by=''))} 
     yield loc 

    locations_select = hxs.select("//select/option/@value").extract() 
    if len(locations_select)>0: 
     yield self.fetchLocations(locations_select, response.meta['company_id'], item)

回調必須返回請求到其他頁面或項目。在上面的代碼看到請求產生，但不是項目。您有yield loc，但loc是dict而不是Item的子類。

來源

2012-06-22 06:41:58 warvariuc

那麼我該如何去做呢？ – AlexZ

你應該做'yield item'而不是'yield loc' – warvariuc

Scrapy - 按條目抓取多個頁面

回答

相關問題