2013-06-21 51 views
2

一些能幫我用scrapy請求類如何使用scrapy請求類

我都試過,但它不工作作出要求:

from scrapy.selector import HtmlXPathSelector 
from scrapy.http.request import Request 
url = 'http://www.fetise.com' 
a = Request(url) 
hxs = HtmlXPathSelector(a) 

錯誤是:

Traceback (most recent call last): 
File "sa.py", line 83, in <module> 
hxs = HtmlXPathSelector(a) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmlsel.py",line 31,in __init__ 
_root = LxmlDocument(response, self._parser) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py",line 27,in __new__ 
cache[parser] = _factory(response, parser) 
File "/usr/local/lib/python2.7/dist-packages/scrapy/selector/lxmldocument.py",line 13, in _factory 
body = response.body_as_unicode().strip().encode('utf8') or '<html/>'AttributeError: 'Request' object has no attribute 'body_as_unicode'` 

我知道回調..實際上,我首先想要從網站中取消網址,然後將它們用作起始網址....

+0

你說的 '它不工作' 的意思。你有錯誤嗎? –

+0

你可以分享你正在談論的菜單的xPath .... – 2013-06-21 16:47:43

+0

這裏是xPath它將生成URL列表'lista = hxs.select('// ul [@ class =「categoryMenu」]/li/ul/li/a/@ href')。extract() acb = [「http://www.fetise.com/」+ i if「http://www.fetise.com/」not in i否則我爲我在lista] + [「http://www.fetise.com/sale」]' –

回答

1

請試試這個:

import urllib 
from scrapy.selector import HtmlXPathSelector 
from pprint import pprint 

url = 'http://www.fetise.com' 
data = urllib.urlopen(url).read() 
hxs = HtmlXPathSelector(text=data) 

lista = hxs.select('//ul[@class="categoryMenu"]/li/ul/li/a/@href').extract() 

acb = ["http://www.fetise.com/" + i if "http://www.fetise.com/" not in i else i for i in lista] + [u"http://www.fetise.com/sale"] 

pprint(acb) 

這是輸出:

[u'http://www.fetise.com/apparel/shirts', 
u'http://www.fetise.com/apparel/tees', 
u'http://www.fetise.com/apparel/tops-and-tees', 
u'http://www.fetise.com/accessories/belts', 
u'http://www.fetise.com/accessories/cufflinks', 
u'http://www.fetise.com/accessories/jewellery', 
u'http://www.fetise.com/accessories/lighters', 
u'http://www.fetise.com/accessories/others', 
u'http://www.fetise.com/accessories/sunglasses', 
u'http://www.fetise.com/accessories/ties-cufflinks', 
u'http://www.fetise.com/accessories/wallets', 
u'http://www.fetise.com/accessories/watches', 
u'http://www.fetise.com/footwear/boots', 
u'http://www.fetise.com/footwear/casual', 
u'http://www.fetise.com/footwear/flats', 
u'http://www.fetise.com/footwear/heels', 
u'http://www.fetise.com/footwear/loafers', 
u'http://www.fetise.com/footwear/sandals', 
u'http://www.fetise.com/footwear/shoes', 
u'http://www.fetise.com/footwear/slippers', 
u'http://www.fetise.com/footwear/sports', 
u'http://www.fetise.com/innerwear/boxers', 
u'http://www.fetise.com/innerwear/briefs', 
u'http://www.fetise.com/personal-care/deos', 
u'http://www.fetise.com/personal-care/haircare', 
u'http://www.fetise.com/personal-care/perfumes', 
u'http://www.fetise.com/personal-care/personal-care', 
u'http://www.fetise.com/personal-care/shavers', 
u'http://www.fetise.com/apparel/tees/gifts-for-her', 
u'http://www.fetise.com/footwear/sandals/gifts-for-her', 
u'http://www.fetise.com/footwear/shoes/gifts-for-her', 
u'http://www.fetise.com/footwear/heels/gifts-for-her', 
u'http://www.fetise.com/footwear/flats/gifts-for-her', 
u'http://www.fetise.com/footwear/ballerinas/gifts-for-her', 
u'http://www.fetise.com/footwear/loafers/gifts-for-her', 
u'http://www.fetise.com/sale'] 
0

該文檔建議您在請求完成時需要傳入callback。回調將有權訪問響應對象:

從文檔:

傳遞附加數據到回調函數¶一個 請求的回調是一個函數,將被調用時 的響應請求被下載。回調函數將以 下載的Response對象作爲其第一個參數來調用。

例子:

def parse_page1(self, response): 
    return Request("http://www.example.com/some_page.html", 
         callback=self.parse_page2) 

def parse_page2(self, response): 
    # this would log http://www.example.com/some_page.html 
    self.log("Visited %s" % response.url)