lxml unicode輸出問題

新的python和lxml，請耐心等待。現在堅持看起來是unicode問題。我試過.encode，美麗的湯的unicodedammit沒有運氣。已經搜索論壇和網頁，但我缺乏python技能未能將建議的解決方案應用於我的特定代碼。感謝任何幫助，謝謝。lxml unicode輸出問題

代碼：

import requests 
import lxml.html 

sourceUrl = "http://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm" 

sourceHtml = requests.get(sourceUrl) 

htmlTree = lxml.html.fromstring(sourceHtml.text) 

for stockCodes in htmlTree.xpath('''/html/body/printfriendly/table/tr/td/table/tr/td/table/tr/table/tr/td'''): 
    string = stockCodes.text 
    print string

錯誤：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

來源

2013-04-07 Om Nom

你能提供關於錯誤的更多細節嗎？或者在'print string'之前添加一行'print type（string）'來查看發生了什麼。 – iceout 2013-04-07 14:46:04

當我運行這樣的代碼python lx.py，我沒有得到這個錯誤。但是，當我將結果發送到sdtout python lx.py > output.txt時，就會發生。所以，試試這個：

# -*- coding: utf-8 -*- 
import requests 
import lxml.html 
import sys 
reload(sys) 
sys.setdefaultencoding('utf-8')

這使您可以從默認的ASCII碼爲UTF-8，這Python運行時將使用每當它解碼的字符串緩衝區爲Unicode轉換。

來源

2013-04-07 08:06:05 iceout

謝謝。將輸出重定向到屏幕時沒有看到錯誤？我可以問你的Python版本嗎？我跑2.7.3 – 2013-04-07 08:38:04

另外，試過你的建議，但沒有喜悅。 – 2013-04-07 08:38:33

我正在使用2.6。你使用哪種操作系統，Linux還是Windows？ – iceout 2013-04-07 09:36:44

text屬性總是返回純字節，但content屬性應該嘗試爲您編碼。你也可以嘗試：sourceHTML.text.encode('utf-8')或sourceHTML.text.encode('ascii')但我相當肯定後者會導致同樣的例外。

來源

2013-04-08 17:02:35

lxml unicode輸出問題

回答

相關問題