-5
主URL = [https://www.amazon.in/s/ref=nb_sb_ss_i_1_8?url=search-alias%3Dcomputers&field-keywords=lenovo+laptop&sprefix=lenovo+m%2Cundefined%2C2740&crid=3L1Q2LMCKALCT]如何抓取scrapy中url的url?
import scrapy
from product.items import ProductItem
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class amazonSpider(scrapy.Spider):
name = "amazon"
allowed_domains = ["amazon.in"]
start_urls = [ main url here]
def parse(self, response):
item=ProductItem()
for content in response.xpath("sample xpath"):
url = content.xpath("a/@href").extract()
request = scrapy.Request(str(url[0]),callback=self.page2_parse)
#url is extracted from my main url
item['product_Rating'] = request
yield item
def page2_parse(self,response):
#here i dint get the response for the second url content
for content in response.xpath(sample xpath):
yield content.xpath(sample xpath).extract()
第二功能不執行這裏提取的URL。請幫助我。
這裏Page2_pase不取第二個網址,我不能再爬 –
有不是一個真正的「刮網址的網址」;您的第二個網址與第一個網址相同。 – blacksite
嗨,我只爬行的第一個網址後,拿到了第二個網址。例如在我的主要網址中,我們可以看到多種產品[筆記本電腦]。因此,在抓取主要網址後,我會獲取每個產品的詳細信息頁面網址。 –