在python3

從文件中讀取字節的字符串的文件的內容是像下面，文件編碼爲UTF-8：在python3

cd232704-a46f-3d9d-97f6-67edb897d65f b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'

這裏是我的代碼：

with open(file, 'r') as f_in: 
    for line in f_in: 
     tokens = line.split('\t') 
     print(tokens[1])

我想得到正確的答案 - 「這個星期五，Gerda Scheuers會很興奮 - 但她對這部電影帶來的商品感到興奮。」

print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII

但我不能從文件中讀取的字節數。如果我打開一個帶有字節的文件，我需要解碼該行來分割它。

來源

2017-04-11 Shaohan Huang

您可以使用ast.literal_eval字面字節轉換爲字節：

然後，將其解碼得到字符串對象：

>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'") 
b'excited \xe2\x80\x94 but she\xe2\x80\x99s' 
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8') 
'excited — but she’s'

with open(file, 'r') as f_in: 
    for line in f_in: 
     tokens = line.split('\t') 
     # if len(tokens) < 2: 
     # continue 
     bytes_part = ast.literal_eval(tokens[1]) 
     s = bytes_part.decode('utf-8') # Decode the bytes to convert to a string

來源

2017-04-11 05:46:01 falsetru

回答

相關問題