2
我需要檢查包含非ascii字符的刮取字段。當我有一個UTF-8在蜘蛛的文字,我得到這個錯誤:Scrapy xpath utf-8文字
ValueError異常:所有的字符串必須是XML兼容:Unicode或ASCII,沒有空字節或控制字符
下面是一個例子產生該誤差
# -*- coding: utf-8 -*-
import scrapy
class DummySpider(scrapy.Spider):
name = 'dummy'
start_urls = ['http://www.google.com']
def parse(self, response):
dummy = response.xpath("//*[contains(.,u'café')]")
這是回溯:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/tmp/stack.py", line 9, in parse
dummy = response.xpath("//*[contains(.,u'café')]")
File "/usr/lib/pymodules/python2.7/scrapy/http/response/text.py", line 109, in xpath
return self.selector.xpath(query)
File "/usr/lib/pymodules/python2.7/scrapy/selector/unified.py", line 97, in xpath
smart_strings=self._lxml_smart_strings)
File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702)
File "xpath.pxi", line 306, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145829)
File "apihelpers.pxi", line 1395, in lxml.etree._utf8 (src/lxml/lxml.etree.c:26485)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
什麼版本的python?咖啡館周圍是單引號字符還是反引號? – fiacre