我有一個鏈接:https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP1.htm我怎樣才能增加鏈接
我想增加這樣的鏈接:https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP2.htm
然後3,4,5 .... 我的代碼是:
# -*- coding: utf-8 -*-
import scrapy
class GlassdoorSpider(scrapy.Spider):
name = 'glassdoor'
#allowed_domains = ['https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11.htm']
start_urls = ['https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP1.htm']
def parse(self, response):
#main_url = "https://www.glassdoor.ca"
urls = response.css('li.jl > div > div.flexbox > div > a::attr(href)').extract()
for url in urls:
url = "https://www.glassdoor.ca" + url
yield scrapy.Request(url = url, callback = self.parse_details)
next_page_url = "https://www.glassdoor.ca/Job/canada-data-jobs-SRCH_IL.0,6_IN3_KE7,11_IP"
if next_page_url:
#next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url = next_page_url, callback = self.parse)
def parse_details(self,response):
yield{
'Job_Title' : response.css('div.header.cell.info > h2::text').extract()
}
self.log("reached22: "+ response.url)
我想增加它的可變next_page_url。
酷路的鏈接,但我知道這是不可能的,但每一頁你的XPath查詢是給我的相同的結果是:https://www.monster.ca/jobs/search/?q=data-analyst & page = 2。 即使是:https://www.monster.ca/jobs/search/?q=data-analyst&page=6 XPath是給鏈接,頁碼2.能否請你檢查。 –
@AshishKapil你確定嗎?它適用於我,在第6頁,它給了我Scrapy shell中的[Out] [1]:u'https://www.monster.ca/jobs/search/?q = data-analyst&page = 7''。 –
你的查詢是完美的,我想我有一個問題在我的最後,無論什麼頁面我給scrapy外殼,無論它只是加載第一頁。 非常感謝再次托馬斯:)) –