Scrapy的HtmlResponse不會從URL中檢索數據

這些是在Ipython中運行的代碼。Scrapy的HtmlResponse不會從URL中檢索數據

from scrapy.selector import Selector 
from scrapy.http import HtmlResponse 

response = HtmlResponse(url='https://en.wikipedia.org/wiki/Pan_American_Games') 
datas = Selector(response=response).xpath('//div[@class="thumb tleft"]')

當我執行response我<200 https://en.wikipedia.org/wiki/Pan_American_Games> 但是，當我執行reponse.body我''（NULL）

好像HtmlResponse沒有檢索到任何HTML的這個頁面信息。

有沒有人知道如何解決這個問題？

僅供參考，如果我在命令提示符下運行$ scrapy shell https://en.wikipedia.org/wiki/Pan_American_Games，則響應不會爲NULL。我不想做scrapy shell url的方式，因爲我將通過URL列表循環運行。

謝謝

來源

2015-06-30 devon

您確定要爲此使用Scrapy嗎？因爲如果你這樣做，你應該真的按照教程，並使用蜘蛛。我很確定這不是使用Scrapy的方式。

如果你只是想在Python 2基本刮板我建議如下：

from urllib2 import urlopen 
from lxml import html 

response = urlopen('https://en.wikipedia.org/wiki/Pan_American_Games') 
page = html.fromstring(response.read()) 
datas = page.xpath('//div[@class="thumb tleft"]')

來源

2015-06-30 08:22:46 Ixio

的問題是，你是不是在這裏書面方式蜘蛛。 HtmlResponse不會從互聯網上檢索任何數據。你擁有的只是一個只有你提供的url屬性的響應對象。

這裏是scrapy的架構有很大的官方描述：http://doc.scrapy.org/en/latest/topics/architecture.html?highlight=scrapy%20architecture

但是，如果你想使用scrapy功能，如選擇不scrapy蜘蛛可以使用requests檢索頁面，並繼續與scrapy selectors，item loaders等等。雖然這不是建議的方法，因爲你會錯過scrapy所提供的所有功能。

官方scrapy初學者教程：http://doc.scrapy.org/en/latest/intro/tutorial.html

來源

2015-06-30 08:24:01 Granitosaurus

Scrapy的HtmlResponse不會從URL中檢索數據

回答

相關問題