內部爬行for循環不同步

Source Code 內部爬行for循環不同步

for hotel in response.xpath('//div[contains(@class,"sr_item")]'): 

      hotelName = hotel.xpath('.//span[contains(@class,"sr-hotel__name")]//text()') 
      print hotelName.extract() 


      hotel_image = hotel.xpath('.//img[contains(@class, "hotel_image")]//@src') 
      print hotel_image.extract() 


      hotelLink = hotel.xpath('.//a[contains(@class,"hotel_name_link")]//@href') 


      yield scrapy.Request(response.urljoin(hotelLink[0].extract()), self.parseHotel) 


     next_page = response.xpath('//a[contains(@class,"paging-next")]//@href')

我的代碼可以看出連接爲圖像。至於，你可以看到，在for循環中。我希望Scrapy從函數「hotelParse」返回，然後繼續執行for循環。

但是，現在，它首先打印所有酒店名稱，意思是，for循環完全執行，然後「hotelParse」開始屈服。

這會弄亂我的輸出，有一次，我開始對項目目標的分配值。

來源

2016-08-19 panther123

有在你的代碼 – eLRuLL

scrapy沒有'hotelParse'方法[異步]（http://doc.scrapy.org/en/latest/topics/architecture。 html＃event-driven-networking），所以我認爲你必須檢查一個更好的方式來處理你的項目 – eLRuLL

幾乎肯定你要做的是從the Scrapy documentation「傳遞附加數據到回調函數」。下面是它會怎樣看你的情況：

def parse_item(self, response): 

    for hotel in response.xpath('//div[contains(@class,"sr_item")]'): 
     item = HotelItem() 

     hotelName = hotel.xpath('.//span[contains(@class,"sr-hotel__name")]//text()') 
     print hotelName.extract() 
     item["hotelName"] = hotelName 

     hotel_image = hotel.xpath('.//img[contains(@class, "hotel_image")]//@src') 
     print hotel_image.extract() 
     item["hotel_image"] = hotel_image 

     hotelLink = hotel.xpath('.//a[contains(@class,"hotel_name_link")]//@href') 

     request = scrapy.Request(response.urljoin(hotelLink[0].extract()), self.parseHotel) 
     request.meta['item'] = item 
     yield request 

    next_page = response.xpath('//a[contains(@class,"paging-next")]//@href') 
    yield scrapy.Request(response.urljoin(next_page.extract()), self.parse_item) 

def parseHotel(self, response): 
    item = response.meta['item'] 
    item["extra_1"] = response.xpath('/example/text()').extract_first() 
    item["extra_2"] = response.xpath('/example2/text()').extract_first() 
    yield item

來源

2016-08-20 15:10:14 neverlastn

內部爬行for循環不同步

回答

相關問題