將字節轉換爲python 3.6中的字符串

我正在嘗試讀取和處理文件。這在Python2.7中完全正常，但我無法在Python 3中工作。在Python 2.7中，它無需任何編碼就可以工作，而在Python 3中，我嘗試了使用和不使用編碼的所有組合。將字節轉換爲python 3.6中的字符串

深潛後，我發現read返回的內容在兩個版本中都有所不同。在Python 2.7

代碼工作：

>>> f = open('resource.cgn', 'r') 
>>> content = f.read() 
>>> type(content) 
<type 'str'> 
>>> content[0:20] 
'\x04#lwq \x7f`g \xa0\x03\xa3,ess to' 
>>> content[0] 
'\x04'

但是在Python 3：

>>> f = open('resource.cgn','r') 
>>> content = f.read() 
Traceback (most recent call last): 
    File "<console>", line 1, in <module> 
    File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode 
    return codecs.ascii_decode(input, self.errors)[0] 
    UnicodeDecodeError: 'ascii' codec cant decode byte 0xa0 in position 10: ordinal not in range(128) 
>>> f = open('resource.cgn','rb') 
>>> content = f.read() 
>>> type(content)     
<class 'bytes'> 
>>> content[0:20] 
b'\x04#lwq \x7f`g \xa0\x03\xa3,ess to' 
>>> content[0] 
4 
>>> content.decode('utf8') 
Traceback (most recent call last): 
    File "<console>", line 1, in <module> 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 10: 
invalid start byte

我想獲得相同的輸出在Python 2.7。該content應string型和content[0]應str的'\x04'而不是int 4

我如何能得到這個任何指針？我嘗試過沒有任何成功的編碼。

來源

2017-06-29 Husain Basrawala

您是否嘗試過'content.decode（ 'unicode_escape'）'？ –

'content [：1]'怎麼樣？那會給你'b'\ x04''。 –

@SamChats的解決方案適用於我。 – nCessity

3.X的str現在2.X的unicode在3.X試圖解碼和編碼在文本模式下打開默認的文件和對象時，您的文件讀取或寫入分別向。 2.X的str現在是3.36中的bytes。 3.X bytes和2.X的str之間確實有很小的差別，它們基本上保持8位文本。

這裏有一個簡單的技巧，以b'\x04#lwq \x7f`g \xa0\x03\xa3,ess to'轉換爲str在3.X：

>>> content = ''.join(chr(x) for x in b'\x04#lwq \x7f`g \xa0\x03\xa3,ess to') 
>>> content 
'\x04#lwq \x7f`g \xa0\x03£,ess to' 
>>> content[0] 
'\x04

解碼bytes字符串，因爲你有無效的UTF-8字符的字節，同樣爲ASCII失敗。

然而，這是明智的一提的是bytes是爲了處理二進制數據和str是Unicode字符串僅在3.x中我們建議再使用bytes，而不是str在3.X二進制字符串：

>>> content = b'\x04#lwq \x7f`g \xa0\x03\xa3,ess to' 
>>> hex(content[0]) 
'0x4'

來源

2017-06-29 18:16:32 direprobs

這工作。有沒有一種方法可以在不提供編碼的情況下將其從'str'轉換回字節？ –

@HusainBasrawala要從'str'轉換爲'bytes'而不提供編碼：'bytes（ord（x）for x in content）'這就好像是:-) – direprobs

@HusainBasrawala：不，你不能轉換' unicode'到'bytes'而不使用一些'encoding'（前面評論中的代碼只是試圖發明自己的方案，類似於.encode（'latin1'）'）。如果你想從字節轉換爲unicode或從unicode轉換爲字節，你需要提供一個編碼：[沒有像純文本那樣的東西。]（https://www.joelonsoftware.com/2003/10/08 /絕對最小的每一個軟件開發人員絕對肯定必須知道關於unicode和字符集沒有藉口/） – jfs

將字節轉換爲python 3.6中的字符串

回答

相關問題