在scrapy外殼中取指不更新objects.What我在這裏失蹤？

我打開scrapy外殼如下在scrapy外殼中取指不更新objects.What我在這裏失蹤？

scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"

這給了我：

[s] Available Scrapy objects: 
[s] hxs  <HtmlXPathSelector xpath=None data=u'<html><head><meta http-equiv="Content-Ty'> 
[s] item  {} 
[s] request <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> 
[s] response <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> 
[s] settings <CrawlerSettings module=None> 
[s] spider  <BaseSpider 'default' at 0x9e1d3ec> 
[s] Useful shortcuts: 
[s] shelp()   Shell help (print this help) 
[s] fetch(req_or_url) Fetch request (or URL) and update local objects 
[s] view(response) View response in a browser 


In [1]: hxs.select('//title') 
Out[1]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>]

如預期從響應標題：

In [1]: hxs.select('//title') 
Out[1]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>]

現在我跟進與一個簡單的提取：

In [2]: fetch("http://www.google.com")

外殼輸出表明對象已更新：

In [2]: fetch("http://www.google.com") 
2013-10-18 23:10:09+0530 [default] DEBUG: Redirecting (302) to <GET http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> from <GET http://www.google.com> 
2013-10-18 23:10:09+0530 [default] DEBUG: Crawled (200) <GET http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> (referer: None) 
[s] Available Scrapy objects: 
[s] hxs  <HtmlXPathSelector xpath=None data=u'<html itemscope="" itemtype="http://sche'> 
[s] item  {} 
[s] request <GET http://www.google.com> 
[s] response <200 http://www.google.co.in/?gws_rd=cr&ei=eHJhUo2sOobSrQeM5ICAAg> 
[s] settings <CrawlerSettings module=None> 
[s] spider  <BaseSpider 'default' at 0x9e1d3ec> 
[s] Useful shortcuts: 
[s] shelp()   Shell help (print this help) 
[s] fetch(req_or_url) Fetch request (or URL) and update local objects 
[s] view(response) View response in a browser

然而，我發現，他們沒有。視圖（響應）顯示我DMOZ頁

並提取標題給出了同樣的舊：

In [3]: hxs.select('//title') 
Out[3]: [<HtmlXPathSelector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>]

缺少什麼我在這裏？

謝謝！

來源

2013-10-18 Cygorger

你正在使用什麼scrapy/python/ipython版本？這個對我有用。 – Rolando

即使它爲我的作品，也許它關係到這個錯誤：https://github.com/scrapy/scrapy/issues/396

你能嘗試運行最新版本的開發？

更新：該問題伴隨ipython 0.10，更新到最新版本，它應該按預期工作。它也修復在scrapy 0.19+（最新開發）中。所以你可以升級ipython或scrapy。

來源

2013-10-19 01:27:54 Rolando

看起來像是這個問題。我正在使用Scrapy 0.18.4和Ipython 0.10.2將嘗試運行最新的開發版本，謝謝。 – Cygorger

Rho，有沒有一種方法可以同時安裝最新的穩定版本和開發版本？我想堅持運行scrapers的穩定版本，只想使用dev版本的shell部分。 – Cygorger

使用'virtualenv'：http://warpedtimes.wordpress.com/2012/09/23/a-tutorial-on-virtualenv-to-isolate-python-installations/ – Rolando

在scrapy外殼中取指不更新objects.What我在這裏失蹤？

回答

相關問題