2014-09-21 42 views
3

我已經啓動了Scrapy shell並已成功ping通維基百科。NameError:使用Scrapy時未定義名稱'hxs'

scrapy shell http://en.wikipedia.org/wiki/Main_Page

我相信,這一步是正確的,通過Scrapy的響應冗餘的性質判斷。

接下來,我想看看當我寫

hxs.select('/html').extract()

此時會發生什麼,我得到的錯誤:

NameError: name 'hxs' is not defined

問題是什麼?我知道Scrapy安裝正常,已經接受了目的地的URL,但爲什麼在命令hxs時會出現問題?

回答

6

我懷疑你正在使用Scrapy的版本,它在shell上不再有hxs

使用sel代替(0.24之後過時,見下文):

$ scrapy shell http://en.wikipedia.org/wiki/Main_Page 
>>> sel.xpath('//title/text()').extract()[0] 
u'Wikipedia, the free encyclopedia' 

或者,如Scrapy 1.0,你應該使用response的選擇對象,與它的.xpath.css方便的方法:

$ scrapy shell http://en.wikipedia.org/wiki/Main_Page 
>>> response.xpath('//title/text()').extract()[0] 
u'Wikipedia, the free encyclopedia' 

僅供參考,Scrapy文檔中從Using selectors報價:

... after the shell loads, you’ll have the response available as response shell variable, and its attached selector in response.selector attribute.
...
Querying responses using XPath and CSS is so common that responses include two convenience shortcuts: response.xpath() and response.css() :

>>> response.xpath('//title/text()')
[<Selector (text) xpath=//title/text()>]
>>> response.css('title::text')
[<Selector (text) xpath=//title/text()>]

+0

工作就像一個魅力。非常感謝你讓我注意到這一點。 – 2014-09-21 02:52:31

+0

@ MattO'Brien很高興幫助。雖然,不知道爲什麼有人downvoted它.. – alecxe 2014-09-21 03:02:24

+0

奇怪的是,它有幾分鐘前+2。看起來它有2個downvotes然後,只活了10分鐘左右...也只顯示4意見! – 2014-09-21 03:05:33

0

你應該用你verbose nature of Scrapy's response.

$ scrapy shell http://en.wikipedia.org/wiki/Main_Page 

如果你的冗長看起來是這樣的:你有什麼顯示

2014-09-20 23:02:14-0400 [scrapy] INFO: Scrapy 0.14.4 started (bot: scrapybot) 
2014-09-20 23:02:14-0400 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled item pipelines: 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 
2014-09-20 23:02:15-0400 [default] INFO: Spider opened 
2014-09-20 23:02:15-0400 [default] DEBUG: Crawled (200) <GET http://en.wikipedia.org/wiki/Main_Page> (referer: None) 
[s] Available Scrapy objects: 
[s] hxs  <HtmlXPathSelector xpath=None data=u'<html lang="en" dir="ltr" class="client-'> 
[s] item  {} 
[s] request <GET http://en.wikipedia.org/wiki/Main_Page> 
[s] response <200 http://en.wikipedia.org/wiki/Main_Page> 
[s] settings <CrawlerSettings module=None> 
[s] spider  <BaseSpider 'default' at 0xb5d95d8c> 
[s] Useful shortcuts: 
[s] shelp()   Shell help (print this help) 
[s] fetch(req_or_url) Fetch request (or URL) and update local objects 
[s] view(response) View response in a browser 
Python 2.7.6 (default, Mar 22 2014, 22:59:38) 
Type "copyright", "credits" or "license" for more information. 

您詳細將顯示Available Scrapy objects

所以hxssel取決於你的詳細。對於你的情況hxs不可用,所以你需要使用'sel'(更新的scrappy版本)。因此,對於一些hxs是確定的和其他sel是什麼,他們需要使用

+1

'0.14.4'已經超過2年了,爲什麼不降級到Scrapy 0.7呢? :) – alecxe 2014-09-21 03:12:56

+0

是@alecxe你對:)我總是使用最新版本,但Scrapy 0.7這是我現在擁有的版本。 – 2014-09-21 03:17:46

0

「選擇」快捷方式已經過時,你應該使用response.xpath(「/ HTML」)。提取物()

+0

那麼,即使它被棄用,它在shell中沒有任何問題。 – GHajba 2015-08-05 08:50:21