2014-05-21 66 views
1

Scrapy:ERROR:我已經寫了樣品刮刀(其實我已經從教程修改刮刀)錯誤處理

from scrapy.spider import Spider 
from scrapy.selector import Selector 
from dirbot.items import Website 


class DmozSpider(Spider): 
name = "dmoz" 
allowed_domains = ["cryptocoincharts.info"] 
start_urls = [ 
    "http://www.cryptocoincharts.info/v2/coins/show/drk", 
] 

def parse(self, response): 

    sel = Selector(response) 
    sites = sel.xpath('//table[@class="table table-striped"]//tr[7]/td[2]') 
    items = [] 

for site in sites: 
    item = Website() 
    item['name'] = site.xpath('text()').re('[^\t\n]+') 
    items.append(item) 
return items 

而且我得到了一個處理錯誤,這裏是日誌:

scrapy爬行DMOZ -o items.json -t JSON

2014-05-21 22:26:54+0200 [scrapy] INFO: Scrapy 0.23.0-231-g2bf09b8 started (bot: scrapybot) 
2014-05-21 22:26:54+0200 [scrapy] INFO: Optional features available: ssl, http11 
2014-05-21 22:26:54+0200 [scrapy] INFO: Overridden settings: {'DEFAULT_ITEM_CLASS': 'dirbot.items.Website', 'FEED_FORMAT': 'json', 'SPIDER_MODULES': ['dirbot.spiders'], 'FEED_URI': 'items.json', 'NEWSPIDER_MODULE': 'dirbot.spiders'} 
2014-05-21 22:26:54+0200 [scrapy] INFO: Enabled extensions: FeedExporter, LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 
2014-05-21 22:26:54+0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2014-05-21 22:26:54+0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2014-05-21 22:26:54+0200 [scrapy] INFO: Enabled item pipelines: FilterWordsPipeline 
2014-05-21 22:26:54+0200 [dmoz] INFO: Spider opened 
2014-05-21 22:26:54+0200 [dmoz] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2014-05-21 22:26:54+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 
2014-05-21 22:26:54+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 
2014-05-21 22:26:54+0200 [dmoz] DEBUG: Crawled (200) <GET http://www.cryptocoincharts.info/v2/coins/show/drk> (referer: None) 
2014-05-21 22:26:54+0200 [dmoz] ERROR: Error processing {'name': [u'0.0160990 BTC', 
       u'7.9770495 USD', 
       u'5.7816480 EUR', 
       u'48.829847 CNY', 
       u'4.7026302 GBP', 
       u'6.9809075 CHF', 
       u'8.6828030 CAD', 
       u'811.85225 JPY', 
       u'8.5037582 AUD', 
       u'83.350117 ZAR', 
       u'0.00595524\xa0oz GOLD (= 0.17\xa0grams)', 
       u'0.37805922\xa0oz SILVER (= 10.72\xa0grams)']} 
    Traceback (most recent call last): 
     File "/usr/lib/pymodules/python2.7/scrapy/middleware.py", line 62, in _process_chain 
     return process_chain(self.methods[methodname], obj, *args) 
     File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 65, in process_chain 
     d.callback(input) 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback 
     self._startRunCallbacks(result) 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks 
     self._runCallbacks() 
    --- <exception caught here> --- 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks 
     current.result = callback(current.result, *args, **kw) 
     File "/home/me/Desktop/scrapy/dirbot-master/dirbot-master/dirbot/pipelines.py", line 13, in process_item 
     if word in unicode(item['description']).lower(): 
     File "/usr/lib/pymodules/python2.7/scrapy/item.py", line 50, in __getitem__ 
     return self._values[key] 
    exceptions.KeyError: 'description' 

2014-05-21 22:26:54+0200 [dmoz] ERROR: Error processing {'name': []} 
    Traceback (most recent call last): 
     File "/usr/lib/pymodules/python2.7/scrapy/middleware.py", line 62, in _process_chain 
     return process_chain(self.methods[methodname], obj, *args) 
     File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 65, in process_chain 
     d.callback(input) 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback 
     self._startRunCallbacks(result) 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks 
     self._runCallbacks() 
    --- <exception caught here> --- 
     File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks 
     current.result = callback(current.result, *args, **kw) 
     File "/home/me/Desktop/scrapy/dirbot-master/dirbot-master/dirbot/pipelines.py", line 13, in process_item 
     if word in unicode(item['description']).lower(): 
     File "/usr/lib/pymodules/python2.7/scrapy/item.py", line 50, in __getitem__ 
     return self._values[key] 
    exceptions.KeyError: 'description' 

2014-05-21 22:26:54+0200 [dmoz] INFO: Closing spider (finished) 
2014-05-21 22:26:54+0200 [dmoz] INFO: Dumping Scrapy stats: 
    {'downloader/request_bytes': 254, 
    'downloader/request_count': 1, 
    'downloader/request_method_count/GET': 1, 
    'downloader/response_bytes': 4986, 
    'downloader/response_count': 1, 
    'downloader/response_status_count/200': 1, 
    'finish_reason': 'finished', 
    'finish_time': datetime.datetime(2014, 5, 21, 20, 26, 54, 390997), 
    'log_count/DEBUG': 3, 
    'log_count/ERROR': 2, 
    'log_count/INFO': 7, 
    'response_received_count': 1, 
    'scheduler/dequeued': 1, 
    'scheduler/dequeued/memory': 1, 
    'scheduler/enqueued': 1, 
    'scheduler/enqueued/memory': 1, 
    'start_time': datetime.datetime(2014, 5, 21, 20, 26, 54, 211942)} 
2014-05-21 22:26:54+0200 [dmoz] INFO: Spider closed (finished) 

我試圖找出到底是怎麼回事,但不幸的是我找不到任何理由爲什麼不出口項目JSON文件。在之前的項目中,scrapy將多行數據導出爲json,沒有任何問題。

回答

2

細看來回溯,有一行:

File "/home/me/Desktop/scrapy/dirbot-master/dirbot-master/dirbot/pipelines.py", line 13, in process_item 
    if word in unicode(item['description']).lower(): 

這意味着,在嘗試處理一個項目在pipeline拋出的錯誤。

然後看場你在蜘蛛填什麼:

for site in sites: 
    item = Website() 
    item['name'] = site.xpath('text()').re('[^\t\n]+') 
    items.append(item) 

正如你看到的,沒有description字段設置。這是錯誤的原因。