2016-08-19 105 views
0

Source Code內部爬行for循環不同步

for hotel in response.xpath('//div[contains(@class,"sr_item")]'): 

      hotelName = hotel.xpath('.//span[contains(@class,"sr-hotel__name")]//text()') 
      print hotelName.extract() 


      hotel_image = hotel.xpath('.//img[contains(@class, "hotel_image")]//@src') 
      print hotel_image.extract() 


      hotelLink = hotel.xpath('.//a[contains(@class,"hotel_name_link")]//@href') 


      yield scrapy.Request(response.urljoin(hotelLink[0].extract()), self.parseHotel) 


     next_page = response.xpath('//a[contains(@class,"paging-next")]//@href') 

我的代碼可以看出連接爲圖像。至於,你可以看到,在for循環中。我希望Scrapy從函數「hotelParse」返回,然後繼續執行for循環。

但是,現在,它首先打印所有酒店名稱,意思是,for循環完全執行,然後「hotelParse」開始屈服。

這會弄亂我的輸出,有一次,我開始對項目目標的分配值。

+0

有在你的代碼 – eLRuLL

+0

scrapy沒有'hotelParse'方法[異步](http://doc.scrapy.org/en/latest/topics/architecture。 html#event-driven-networking),所以我認爲你必須檢查一個更好的方式來處理你的項目 – eLRuLL

回答

0

幾乎肯定你要做的是從the Scrapy documentation「傳遞附加數據到回調函數」。下面是它會怎樣看你的情況:

def parse_item(self, response): 

    for hotel in response.xpath('//div[contains(@class,"sr_item")]'): 
     item = HotelItem() 

     hotelName = hotel.xpath('.//span[contains(@class,"sr-hotel__name")]//text()') 
     print hotelName.extract() 
     item["hotelName"] = hotelName 

     hotel_image = hotel.xpath('.//img[contains(@class, "hotel_image")]//@src') 
     print hotel_image.extract() 
     item["hotel_image"] = hotel_image 

     hotelLink = hotel.xpath('.//a[contains(@class,"hotel_name_link")]//@href') 

     request = scrapy.Request(response.urljoin(hotelLink[0].extract()), self.parseHotel) 
     request.meta['item'] = item 
     yield request 

    next_page = response.xpath('//a[contains(@class,"paging-next")]//@href') 
    yield scrapy.Request(response.urljoin(next_page.extract()), self.parse_item) 

def parseHotel(self, response): 
    item = response.meta['item'] 
    item["extra_1"] = response.xpath('/example/text()').extract_first() 
    item["extra_2"] = response.xpath('/example2/text()').extract_first() 
    yield item