從unicode對象中刪除十六進制字符

我試圖從我的字符串中刪除十六進制字符\xef\xbb\xbf但是我收到以下錯誤。從unicode對象中刪除十六進制字符

不太清楚如何解決這個問題。

>>> x = u'\xef\xbb\xbfHello' 
>>> x 
u'\xef\xbb\xbfHello' 
>>> type(x) 
<type 'unicode'> 
>>> print x 
ï»¿Hello 
>>> print x.replace('\xef\xbb\xbf', '') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) 
>>>

來源

2016-11-09 Danny Cullen

您需要更換unicode的對象，否則將Python2嘗試編碼x與ASCII編碼解碼器來搜索它的一個str。

>>> x = u'\xef\xbb\xbfHello' 
>>> x 
u'\xef\xbb\xbfHello' 
>>> print(x.replace(u'\xef\xbb\xbf',u'')) 
Hello

這隻適用於Python2。在Python3中，兩個版本都可以工作。

來源

2016-11-09 12:29:46

嘗試使用無論是decode或unicode功能，像這樣：

x.decode('utf-8')

或

unicode(string, 'utf-8')

來源：UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

來源

2016-11-09 12:34:04

真正的問題是，你的Unicode字符串被錯誤地解碼首先。這些字符是一個UTF-8字節順序標記（BOM）字符，被解碼爲（可能是）latin-1或cp1252。

理想的情況下，確定他們是如何解碼的，但你可以通過重新編碼爲latin1的解碼正確扭轉錯誤：

>>> x = u'\xef\xbb\xbfHello' 
>>> x.encode('latin1').decode('utf8') # decode correctly, U+FEFF is a BOM. 
u'\ufeffHello' 
>>> x.encode('latin1').decode('utf-8-sig') # decode and handle BOM. 
u'Hello'

來源

2016-11-09 16:32:16

從unicode對象中刪除十六進制字符

回答

相關問題