0
我是否無法抓取此網站? :爲什麼我不能抓取這個網站與Scrapy
我嘗試了很容易scrapy代碼,看看我是否可以從網站上的東西,但無論我嘗試我什麼都得不到..
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.log import *
from vacatures.settings import *
from vacatures.items import *
from scrapy.http import Request
class VacaturesSpider(CrawlSpider):
name = 'vacatures_spider'
allowed_domains = ['www.itbanen.nl']
start_urls = ['http://www.itbanen.nl/vacature/zoeken/overzicht/wijzigingsdatum/query//distance/30/output/html/items_per_page/15/page/1/ignore_ids']
def parse(self, response):
self.log('Nieuwe pagina! %s' % response.url)
#hxs = HtmlXPathSelector(response)
sel = Selector(response)
# HXS to find url that goes to detail page
test = sel.xpath('//div[@id="resultlist"]/div[@class="resultlist"]/h2/text()').extract()
print test
links = sel.xpath('//div[@class="container"]/h2/text()')
print links
for link in links:
link_item = link.extract()
print link_item
#yield Request(complete_url(link_item), callback=self.parse_category)
你也許可以先檢查'response'來找出你最近得到了什麼? –
你確定你的XPath表達式正確嗎?我沒有看到您的表達式與頁面中的元素匹配。你可以用'sel.css('div#resultlist div.resultlist h2 :: text')'和'sel.css('div.container h2 :: text')''來使用CSS選擇器,例如 –
我的檢查是嘗試從頁面獲取內容,但即使使用這個簡單的腳本,我也不會收回任何內容,並且如果我在其他網站上運行此腳本(作爲測試),它確實有效? – Beer