scrapy.selector.Selector.__init__()
expects a Response
object as first argument。
如果你想建立一個HTTP響應主體選擇,使用text=
參數:
$ scrapy shell https://docs.python.org/2/howto/curses.html
2016-11-21 11:05:34 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-21 11:05:35 [scrapy] INFO: Spider opened
2016-11-21 11:05:35 [scrapy] DEBUG: Crawled (200) <GET https://docs.python.org/2/howto/curses.html> (referer: None)
(...)
>>>
>>> #
>>> # passing response.body (bytes) instead of a Response object fails
>>> #
>>> scrapy.Selector(response.body)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/selector/unified.py", line 67, in __init__
text = response.text
AttributeError: 'str' object has no attribute 'text'
>>>
>>> #
>>> # use text= argument to pass response body
>>> #
>>> scrapy.Selector(text=response.body)
<Selector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'>
>>>
>>> scrapy.Selector(text=response.body).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
>>>
更簡單的方法是通過直接響應對象:
>>> scrapy.Selector(response).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
和偶更簡單的方法是使用.xpath()
method on the response
instance(這是一種方便的方法,可以爲您創建一個選擇器),前提是您的回覆爲HtmlResponse
或XmlResponse
(這通常適用於網頁抓取)
>>> response.xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
你能修復你的代碼粘貼的縮進嗎? –