的UnicodeDecodeError：「GBK」編碼解碼器時，閱讀JSON包含中國

在我jupyter筆記本代碼

file = "./data/test.json" 
with open(file) as data_file:  
    data = json.load(data_file)

過去是無法解碼字節精細與Python 2，但現在後只需切換到Python 3，它給我的錯誤

UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 123: illegal multibyte sequence

的test.json文件是這樣的：

[{ 
    "name": "Daybreakers", 
    "detail_url": "http://www.movieinsider.com/m4120/daybreakers/", 
    "movie_tt_id": "中文" 
    }]

如果我刪除中文，將不會有錯誤。

那我該怎麼辦？

在這裏有很多類似的問題，但我沒有找到一個好的解決方案，我的情況。如果你找到適用的，請告訴我，我會關閉這個。

非常感謝！

2016-12-06 cqcn1991

您需要在打開文件時指定正確的編碼。如果JSON用UTF-8編碼，你可以這樣做：

import json 

fname = "test.json" 
with open(fname, encoding='utf-8') as data_file:  
    data = json.load(data_file) 

print(data)

輸出

[{'name': 'Daybreakers', 'detail_url': 'http://www.movieinsider.com/m4120/daybreakers/', 'movie_tt_id': '中文'}]

2016-12-06 14:34:41

回答