Scrapy xpath utf-8文字

我需要檢查包含非ascii字符的刮取字段。當我有一個UTF-8在蜘蛛的文字，我得到這個錯誤：Scrapy xpath utf-8文字

ValueError異常：所有的字符串必須是XML兼容：Unicode或ASCII，沒有空字節或控制字符

下面是一個例子產生該誤差

# -*- coding: utf-8 -*- 
import scrapy 

class DummySpider(scrapy.Spider): 
    name = 'dummy' 
    start_urls = ['http://www.google.com'] 

    def parse(self, response): 
     dummy = response.xpath("//*[contains(.,u'café')]")

這是回溯：

Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/tmp/stack.py", line 9, in parse 
    dummy = response.xpath("//*[contains(.,u'café')]") 
    File "/usr/lib/pymodules/python2.7/scrapy/http/response/text.py", line 109, in xpath 
    return self.selector.xpath(query) 
    File "/usr/lib/pymodules/python2.7/scrapy/selector/unified.py", line 97, in xpath 
    smart_strings=self._lxml_smart_strings) 
    File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702) 
    File "xpath.pxi", line 306, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145829) 
    File "apihelpers.pxi", line 1395, in lxml.etree._utf8 (src/lxml/lxml.etree.c:26485) 
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

來源

2015-11-24 user3185563

什麼版本的python？咖啡館周圍是單引號字符還是反引號？ – fiacre

"//*[contains(.,u'café')]"

u''字符串文字是Python語法，不屬於XPath。試試：

u"//*[contains(.,'café')]"

來源

2015-11-24 12:09:53 bobince

Scrapy xpath utf-8文字

回答

相關問題