我想用response.replace()替換google的搜索結果頁面的搜索結果塊的響應正文,我面臨一些編碼問題。Scrapy response.replace編碼錯誤
scrapy shell "http://www.google.de/search?q=Zuckerccc"
>>> srb = hxs.select("//li[@class='g']").extract()
>>> body = '<html><body>' + srb[0] + '</body></html>' # get only 1st search result block
>>> b = response.replace(body = body)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 54, in replace
return Response.replace(self, *args, **kwargs)
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 77, in replace
return cls(*args, **kwargs)
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 31, in __init__
super(TextResponse, self).__init__(*args, **kwargs)
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 19, in __init__
self._set_body(body)
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 48, in _set_body
self._body = body.encode(self._encoding)
File "../local_1/Linux-2.6c2.5-x86_64/Python/Python-147.0-0/lib/python2.6/encodings/cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0131' in position 529: character maps to <undefined>
我試圖創建自己的反應以及,
>>> x = HtmlResponse("http://www.google.de/search?q=Zuckerccc", body = body, encoding = response.encoding)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 31, in __init__
super(TextResponse, self).__init__(*args, **kwargs)
self._set_body(body)
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 48, in _set_body
self._body = body.encode(self._encoding)
File "../local_1/Linux-2.6c2.5-x86_64/Python/Python-147.0-0/lib/python2.6/encodings/cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0131' in position 529: character maps to <undefined>
File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 19, in __init__
此外,當我使用的編碼替換()函數_body_declared_encoding(),它的工作原理。
replace(body = body, encoding = response._body_declared_encoding())
我不明白爲什麼response._body_declared_encoding()和response.encoding是不同的。任何人都可以談談這件事。
那麼,解決這個問題的好方法是什麼?
你說它適用於'encoding = response._body_declared_encoding()'而不是'encoding = response.encoding'?另外,嘗試使用'unicode'而不是'str'像這樣:'body = u'
'+ srb [0] +''' –Ya,它可以使用_body_declared_encoding(),因爲它的值是UTF-8 response.encoding的值是cp1252。 – kumar
這很奇怪。我提到的unicode字符串呢?它工作正常嗎? –