如何獲得UnicodeDecodeError發生的位置？

如何獲得UnicodeDecodeError發生位置？我發現材料超過here並試圖在下面實施。但我只是得到一個錯誤NameError: name 'err' is not defined如何獲得UnicodeDecodeError發生的位置？

我已經在互聯網上搜索了已經在這裏和StackOverflow，但找不到任何提示如何使用它。在python文檔中，它說這個特殊的異常有start屬性，所以它必須是可能的。

謝謝。

data = buffer + data 
    try: 
     data = data.decode("utf-8") 
    except UnicodeDecodeError: 
     #identify where did the error occure? 
     #chunk that piece off -> copy troubled piece into buffer and 
     #decode the good one -> then go back, receive the next chunk of 
     #data and concatenate it to the buffer. 

     buffer = err.data[err.start:] 
     data = data[0:err.start] 
     data = data.decode("utf-8")

來源

2016-06-28 Li Cooper

投票方式關閉如瑣碎'因爲答案是一個重要的語法細節。可能仍然有用，供將來參考。 –

該信息存儲在例外本身中。你可以用as關鍵字異常對象，並使用start屬性：

while True: 
    try: 
     data = data.decode("utf-8") 
    except UnicodeDecodeError as e: 
     data = data[:e.start] + data[e.end:] 
    else: 
     break

來源

2016-06-28 01:34:49 zondo

那很簡單。非常感謝:) –

在文檔中他們說：「例如，err.object [err.start：err.end]給出了編解碼器失敗的特定無效輸入。」這是什麼：err.object [err.start：err.end]實際上是什麼意思？它與e.start不一樣，甚至不是很接近。 –

@Cooper：很好！我應該編輯我的答案。其實它很接近。這是[切片]（https://docs.python.org/3/reference/expressions.html#slicings）。它意味着把所有位置在'err.start'和'err.end'位置之間。這包括'err.start'，但不包括'err.end'。在大多數情況下，結束只是開始後的一個字符，所以我的解決方案將工作。但是，我認爲有些情況下，'err.end'不止一個更高。 – zondo

如果你只是想忽略錯誤和解碼的休息，你可以這樣做：

data = data.decode("utf-8", errors='ignore')

來源

2016-06-28 01:38:58 shiva

謝謝，我會牢記在心:) –

如何獲得UnicodeDecodeError發生的位置？

回答

相關問題