2016-11-22 68 views
-5

主URL = [https://www.amazon.in/s/ref=nb_sb_ss_i_1_8?url=search-alias%3Dcomputers&field-keywords=lenovo+laptop&sprefix=lenovo+m%2Cundefined%2C2740&crid=3L1Q2LMCKALCT]如何抓取scrapy中url的url?

從主URL = [http://www.amazon.in/Lenovo-Ideapad-15-6-inch-Integrated-Graphics/dp/B01EN6RA7W?ie=UTF8&keywords=lenovo%20laptop&qid=1479811190&ref_=sr_1_1&s=computers&sr=1-1]

import scrapy 
from product.items import ProductItem 
from scrapy.linkextractors import LinkExtractor 
from scrapy.spiders import CrawlSpider, Rule 

class amazonSpider(scrapy.Spider): 
    name = "amazon" 
    allowed_domains = ["amazon.in"] 
    start_urls = [ main url here] 
    def parse(self, response): 
     item=ProductItem() 
     for content in response.xpath("sample xpath"): 
      url = content.xpath("a/@href").extract() 
      request = scrapy.Request(str(url[0]),callback=self.page2_parse) 
     #url is extracted from my main url 
      item['product_Rating'] = request 
     yield item 
    def page2_parse(self,response): 
    #here i dint get the response for the second url content 
     for content in response.xpath(sample xpath): 
      yield content.xpath(sample xpath).extract() 

第二功能不執行這裏提取的URL。請幫助我。

+0

這裏Page2_pase不取第二個網址,我不能再爬 –

+0

有不是一個真正的「刮網址的網址」;您的第二個網址與第一個網址相同。 – blacksite

+0

嗨,我只爬行的第一個網址後,拿到了第二個網址。例如在我的主要網址中,我們可以看到多種產品[筆記本電腦]。因此,在抓取主要網址後,我會獲取每個產品的詳細信息頁面網址。 –

回答

0

最後我這樣做,請按照下面的代碼來實現爬行值形成的URL的URL。

def parse(self, response): 
    item=ProductItem() 
    url_list = [content for content in response.xpath("//div[@class='listing']/div/a/@href").extract()] 
    item['product_DetailUrl'] = url_list 
    for url in url_list: 
     request = Request(str(url),callback=self.page2_parse) 
     request.meta['item'] = item 
     yield request 

def page2_parse(self,response): 
    item=ProductItem() 
    item = response.meta['item'] 
    item['product_ColorAvailability'] = [content for content in response.xpath("//div[@id='templateOption']//ul/li//img/@color").extract()] 
    yield item