爲什麼Scrapy不能抓取/解析？

這可能是一個重複的問題。我正在嘗試運行Scrapy蜘蛛，但無法運行。爲什麼我會收到錯誤消息「HtmlResponse沒有屬性urljoin」？如果request_count是3並且response_count也是3，那麼Scrapy統計數據暗示了什麼？我的代碼在這裏。我希望在這個問題上有任何幫助。爲什麼Scrapy不能抓取/解析？

import scrapy 
from scrapy.http.request import Request 
from scrapy.spiders import BaseSpider 
from scrapy.selector import HtmlXPathSelector 

class BotSpider_2(BaseSpider): 
    name = 'BotSpider_2' 
    name = "google.co.th" 
    start_urls = ["http://www.google.co.th/"] 


    def parse(self, response): 
     sel = Selector(response) 
     sites = sel.xpath('//title/text()').extract() 
     print sites

來源

2016-09-28 Pavitra Atha

首先您的導入不正確。例如 - 爲什麼您使用BaseSpider而不是Spider？你也沒有進口Selector。關於urljoin錯誤您描述越來越我沒有看到您發佈的代碼拋出此錯誤; urljoin是自scrapy v1以來的響應對象的功能，它將當前的url與某些路徑相結合，以創建可用於抓取的絕對url。

$ scrapy shell "https://scrapy.org" 
In [1]: response.url 
Out[1]: 'https://scrapy.org' 

In [2]: response.urljoin('/some/cool/path') 
Out[2]: 'https://scrapy.org/some/cool/path'

我已經清理了進口，你的代碼工作就像一個魅力！

import scrapy 
from scrapy.selector import Selector 

class BotSpider_2(scrapy.Spider): 
    name = "google.co.th" 
    start_urls = ["http://www.google.co.th/"] 


    def parse(self, response): 
     sel = Selector(response) 
     sites = sel.xpath('//title/text()').extract() 
     print(sites)

來源

2016-09-28 06:10:41 Granitosaurus

爲什麼Scrapy不能抓取/解析？

回答

相關問題