2015-10-18 33 views
1

我正在編寫一個Scrapy項目。我測試了一切,但是當我解析一個頁面時,它返回TypeError: Argument must be bytes or unicode, got 'list'我已經使用this link.測試了shell中的所有內容,而且我似乎無法找到它出現問題的位置。我的所有shell命令只返回一個項目(即沒有逗號)。參數必須是字節或unicode,得到列表

有沒有人知道爲什麼這可能是這種情況?

from scrapy.spiders import Spider 
from scrapy.selector import HtmlXPathSelector 
from scrapy.loader import XPathItemLoader 
from scrapy.loader.processors import Join, MapCompose 
from scraper_app.items import Grailed 

class GrailedSpider(Spider): 
    name = "grailed" 
    allowed_domains = ["grailed.com"] 
    base_url = "https://www.grailed.com/listings/" 
    start_urls = ["https://www.grailed.com/listings/100"] 

    for i in range(100, 150): 
     start_urls.append(base_url + str(i)) 

    item_fields = { 
     'created': '//ul[@class = "horizontal-list listing-metadata-list clearfix"]/li[@class="horizontal-list-item listing-metadata-item"][1]/span[2]/text()', 
     'title_size': '//h1[@class = "designer"]/div/text()', 
     'original_price': '//ul[@class = "horizontal-list price-drops clearfix"]/li/text()', 
     'followers': '//div[@class = "listing-followers"]/p/text()', 
     'shipping_price': '//div[@class = "listing-shipping"]/p/text()', 
     'sellers_wardrobe': '//div[@class = "user-widget medium"]/a/text()', 
     'bought_and_sold': '//div[@class = "user-widget-bottom"]/p[@class= "bought-and-sold"]/text()[1]', 
     'feedback_score': '//div[@class = "green seller-score-top"]/text()[2]' 
    } 

    def parse(self, response): 
     selector = HtmlXPathSelector(response) 

     # iterate over urls 
     for url in selector.xpath(self.start_urls): 
      loader = XPathItemLoader(Grailed(), selector=url) 

      # define processors 
      loader.default_input_processor = MapCompose(unicode.strip) 
      loader.default_output_processor = Join() 

      # iterate over fields and add xpaths to the loader 
      for field, xpath in self.item_fields.iteritems(): 
       loader.add_xpath(field, xpath) 
      yield loader.load_item() 

回溯顯示

ERROR: Spider error processing <GET https://www.grailed.com/listings/144> (referer: None) 
Traceback (most recent call last): 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback 
    yield next(it) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output 
    for x in result: 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr> 
    return (_set_referer(r) for r in result or()) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Users/phillipblack/Projects/scrape_workspace/grailed/scraper_app/spiders/grailed_spider.py", line 55, in parse 
    for url in selector.xpath(self.start_urls): 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/selector/unified.py", line 97, in xpath 
    smart_strings=self._lxml_smart_strings) 
    File "lxml.etree.pyx", line 1507, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:52198) 
    File "xpath.pxi", line 295, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:151999) 
    File "apihelpers.pxi", line 1391, in lxml.etree._utf8 (src/lxml/lxml.etree.c:27100) 
TypeError: Argument must be bytes or unicode, got 'list' 
+2

顯示整個回溯。 – ekhumoro

+0

你的錯誤是你的for循環'在selector.xpath(self.start_urls)'中的url。我建議在這之前添加一個for循環,這樣你只需從列表中發送一個項目給selector.xpath就可以了。 – Maikflow

+0

@Maikflow對不起,這是我的第一個項目。這將如何工作? – PDog

回答

1

的問題是在這條線:

for url in selector.xpath(self.start_urls): 

selector.xpath應該接收的字符串,用XPath命令。我看到你想獲得網址,因此可能類似//a/@href

selector.xpath('//a/@href') 
+0

我試圖讓它解析前面for循環中定義的所有url。 – PDog

相關問題