bytestr.decode（ 'UTF-8'），從文件返回的UnicodeDecodeError

讀取的字節字符串：bytestr.decode（ 'UTF-8'），從文件返回的UnicodeDecodeError

>>> s = b'------WebKitFormBoundary02jEyE1fNXSRCL7D\r\nContent-Disposition: form-data; name="fileobj"; filename="3d15ef5126d4fa6631a863c29c5a741d.jpg"\r\nContent-Type: image/jpeg\r\n\r\n\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x006Exif\x00\x00II*' 
>>> s 
b'------WebKitFormBoundary02jEyE1fNXSRCL7D\r\nContent-Disposition: form-data; name="fileobj"; filename="3d15ef5126d4fa6631a863c29c5a741d.jpg"\r\nContent-Type: image/jpeg\r\n\r\n\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xe1\x006Exif\x00\x00II*' 
>>> print(s.decode('utf8')) 
Traceback (most recent call last): 
    File "<input>", line 1, in <module> 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 167: invalid start byte

爲什麼UnicodeDecodeError錯誤？ s.decode（'utf8'）必須返回str對象？

來源

2015-11-04 user1356067

的字節字符串包含除其他事項外的二進制圖像。 'utf-8'是一種字符編碼 - 它用於編碼文本，而不是二進制數據，如圖像。

一般來說，解析MIME數據，你可以使用email STDLIB包。

在你的情況下，足以找到頭結束（空行），其餘保存爲圖像：

import cgi 

headers, _, image = s.partition(b'\r\n\r\n') 
L = [cgi.parse_header(h)[1].get('filename') # parse headers, to get filename 
    for h in headers.decode('ascii', 'strict').splitlines()] 
filename = next(filter(None, L)) 
with open(filename, 'wb') as file: 
    file.write(image)

來源

2015-11-04 20:50:18 jfs

好的，謝謝！ s.partition（b'\ r \ n \ r \ n'） - 這就是我需要的） – user1356067

因爲它是不恰當UTF-8字符串。 UTF-8字符不能從0xff開始。您可以使用errors標誌來控制解碼過程。閱讀doc

是的，bytes.decode和bytearray.decode返回str對象。

來源

2015-11-04 20:17:00

bytestr.decode（ 'UTF-8'），從文件返回的UnicodeDecodeError

回答

相關問題