2013-10-18 28 views
0

我打開scrapy外殼如下在scrapy外殼中取指不更新objects.What我在這裏失蹤?

scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/" 

這給了我:

[s] Available Scrapy objects: 
[s] hxs  <HtmlXPathSelector xpath=None data=u'<html><head><meta http-equiv="Content-Ty'> 
[s] item  {} 
[s] request <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> 
[s] response <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> 
[s] settings <CrawlerSettings module=None> 
[s] spider  <BaseSpider 'default' at 0x9e1d3ec> 
[s] Useful shortcuts: 
[s] shelp()   Shell help (print this help) 
[s] fetch(req_or_url) Fetch request (or URL) and update local objects 
[s] view(response) View response in a browser 


In [1]: hxs.select('//title') 
Out[1]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>] 

如預期從響應標題:

In [1]: hxs.select('//title') 
Out[1]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>] 

現在我跟進與一個簡單的提取:

In [2]: fetch("http://www.google.com") 

外殼輸出表明對象已更新:

In [2]: fetch("http://www.google.com") 
2013-10-18 23:10:09+0530 [default] DEBUG: Redirecting (302) to <GET http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> from <GET http://www.google.com> 
2013-10-18 23:10:09+0530 [default] DEBUG: Crawled (200) <GET http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> (referer: None) 
[s] Available Scrapy objects: 
[s] hxs  <HtmlXPathSelector xpath=None data=u'<html itemscope="" itemtype="http://sche'> 
[s] item  {} 
[s] request <GET http://www.google.com> 
[s] response <200 http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> 
[s] settings <CrawlerSettings module=None> 
[s] spider  <BaseSpider 'default' at 0x9e1d3ec> 
[s] Useful shortcuts: 
[s] shelp()   Shell help (print this help) 
[s] fetch(req_or_url) Fetch request (or URL) and update local objects 
[s] view(response) View response in a browser 

然而,我發現,他們沒有。視圖(響應)顯示我DMOZ頁

並提取標題給出了同樣的舊:

In [3]: hxs.select('//title') 
Out[3]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>] 

缺少什麼我在這裏?

謝謝!

+1

你正在使用什麼scrapy/python/ipython版本?這個對我有用。 – Rolando

回答

1

即使它爲我的作品,也許它關係到這個錯誤:https://github.com/scrapy/scrapy/issues/396

你能嘗試運行最新版本的開發?

更新:該問題伴隨ipython 0.10,更新到最新版本,它應該按預期工作。它也修復在scrapy 0.19+(最新開發)中。所以你可以升級ipython或scrapy。

+0

看起來像是這個問題。我正在使用Scrapy 0.18.4和Ipython 0.10.2將嘗試運行最新的開發版本,謝謝。 – Cygorger

+0

Rho,有沒有一種方法可以同時安裝最新的穩定版本和開發版本?我想堅持運行scrapers的穩定版本,只想使用dev版本的shell部分。 – Cygorger

+0

使用'virtualenv':http://warpedtimes.wordpress.com/2012/09/23/a-tutorial-on-virtualenv-to-isolate-python-installations/ – Rolando

相關問題