原諒我,我是一個總編程noob。scrapy蜘蛛中的分裂變量
我想從下面的代碼中提取一個URL的記錄ID,我遇到了麻煩。如果我在shell中運行它,它似乎好工作(沒有錯誤),但是當我通過scrapy運行它的框架產生錯誤
例子:
如果網址是的http://域。 COM /路徑/到/ RECORD_ID = 1599
然後record_link = /路徑/到/ RECORD_ID = 1599
因此RECORD_ID應該=
for site in sites:
record_link = site.select('div[@class="description"]/h4/a/@href').extract()
record_id = record_link.strip().split('=')[1]
item['link'] = record_link
item['id'] = record_id
items.append(item)
任何幫助是極大的讚賞
編輯::
Scrapy這樣的錯誤是這樣的:因爲你長時間拍攝
[email protected]:/home/user/spiderdir/spiderdir/spiders# sudo scrapy crawl spider
2012-02-23 09:47:16+1100 [scrapy] INFO: Scrapy 0.13.0.2839 started (bot: spider)
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Enabled item pipelines:
2012-02-23 09:47:16+1100 [spider] INFO: Spider opened
2012-02-23 09:47:16+1100 [spider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6031
2012-02-23 09:47:16+1100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6088
2012-02-23 09:47:19+1100 [spider] DEBUG: Crawled (200) <GET http://www.domain.com/path/to/> (referer: None)
2012-02-23 09:47:21+1100 [spider] DEBUG: Crawled (200) <GET http://www.domain.com/path/to/record_id=2> (referer: http://www.domain.com/path/to/)
2012-02-23 09:47:21+1100 [spider] ERROR: Spider error processing <GET http://www.domain.com/path/to/record_id=2>
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 778, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.6/dist-packages/twisted/internet/task.py", line 577, in _tick
taskObj._oneWorkUnit()
File "/usr/lib/python2.6/dist-packages/twisted/internet/task.py", line 458, in _oneWorkUnit
result = self._iterator.next()
File "/usr/lib/pymodules/python2.6/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/usr/lib/pymodules/python2.6/scrapy/utils/defer.py", line 96, in iter_errback
yield it.next()
File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/offsite.py", line 24, in process_spider_output
for x in result:
File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/referer.py", line 14, in <genexpr>
return (_set_referer(r) for r in result or())
File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/urllength.py", line 32, in <genexpr>
return (r for r in result or() if _filter(r))
File "/usr/lib/pymodules/python2.6/scrapy/contrib/spidermiddleware/depth.py", line 56, in <genexpr>
return (r for r in result or() if _filter(r))
File "/usr/lib/pymodules/python2.6/scrapy/contrib/spiders/crawl.py", line 66, in _parse_response
cb_res = callback(response, **cb_kwargs) or()
File "/home/nick/googledir/googledir/spiders/google_directory.py", line 36, in parse_main
record_id = record_link.split("=")[1]
exceptions.AttributeError: 'list' object has no attribute 'split'
`
你應該張貼你的錯誤太 – goh 2012-03-04 11:12:21