2012-10-08 38 views
8

我遵循Scrapy教程文檔http://media.readthedocs.org/pdf/scrapy/0.14/scrapy.pdf,並且我已經驗證了items.py和dmoz_spider.py是正確輸入的(不是剪切&)。Scrapy教程例外

第一個「嗯......」我的部分是該指令:

這是我們的第一個蜘蛛的代碼;將它保存在dmoz/spiders目錄下的一個名爲dmoz_spider.py的文件中

我正在使用最新版本的Ubuntu,教程/教程/蜘蛛。 (這個當時我的第一個錯誤?)

因此,這裏是我的dmoz_spider.py腳本:

from scrapy.spider import BaseSpider 

class DmozSpider(BaseSpider): 
    name = "dmoz" 
    allowed_domains = ["dmoz.org"] 
    start_urls = [ 
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", 
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" 
    ] 

def parse(self, response): 
    filename = response.url.split("/")[-2] 
    open(filename, 'wb').write(response.body) 

在我的終端I型

scrapy crawl dmoz 

而且我得到這個:

2012-10-08 13:20:22-0700 [scrapy] INFO: Scrapy 0.12.0.2546 started (bot: tutorial) 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled item pipelines: 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 
2012-10-08 13:20:22-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider opened 
2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None) 
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: <None>) 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop 
    self.runUntilCurrent() 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent 
    call.func(*call.args, **call.kw) 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback 
    self._startRunCallbacks(result) 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks 
    self._runCallbacks() 
--- <exception caught here> --- 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse 
    raise NotImplementedError 
exceptions.NotImplementedError: 

2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None) 
2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: <None>) 
Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop 
    self.runUntilCurrent() 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent 
    call.func(*call.args, **call.kw) 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback 
    self._startRunCallbacks(result) 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks 
    self._runCallbacks() 
--- <exception caught here> --- 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse 
    raise NotImplementedError 
exceptions.NotImplementedError: 

2012-10-08 13:20:22-0700 [dmoz] INFO: Closing spider (finished) 
2012-10-08 13:20:22-0700 [dmoz] INFO: Spider closed (finished) 

在我的搜索中,我看到有人說扭曲可能沒有安裝...但不會安裝,如果我使用用於Scrapy的Ubuntu軟件包安裝程序?

在此先感謝!

+0

¿爲什麼不首先檢查它是否實際安裝?不要相信你的猜測:) – Alfabravo

回答

15

BaseSpider中的解析方法被調用而不是你的,因爲你沒有正確地重寫解析方法。您的縮進是錯誤的,所以解析被聲明爲DmozSpider類之外的函數。歡迎來到python :)

這與扭曲無關,我可以看到扭曲是在回溯中,所以它明顯安裝。

+1

啊,就是這樣。謝謝!縮進「def parse」行後,一切都很順利!確實歡迎來到Python。 :) – user1729889

+0

謝謝謝恩從2015〜 –