UnicodeDecodeError：'ascii'編解碼器無法解碼字節... Python 2.7和

我正在閱讀來自許多不同國家的unicode字符的文本文件。該文件中的數據也是JSON格式。UnicodeDecodeError：'ascii'編解碼器無法解碼字節... Python 2.7和

我正在使用CentOS機器。當我在終端中打開文件時，unicode字符顯示得很好（所以我的終端配置爲unicode）。

當我在Eclipse中測試我的代碼時，它工作正常。當我跑我的在終端的代碼，它拋出一個錯誤：的UnicodeDecodeError：「ASCII」編解碼器不能在17位解碼字節0xc3：有序不在範圍內（128）

for line in open("data-01083"): 
    try: 
     tmp = line 
     if tmp == "": 
      break 
     theData = json.loads(tmp[41:]) 

     for loc in theData["locList"]: 
      outLine = tmp[:40] 
      outLine = outLine + delim + theData["names"][0]["name"] 
      outLine = outLine + delim + str(theData.get("Flagvalue")) 
      outLine = outLine + delim + str(loc.get("myType")) 
      flatAdd = "" 
      srcAddr = loc.get("Address") 
      if srcAddr != None: 
       flatAdd = delim + str(srcAddr.get("houseNumber")) 
       flatAdd = flatAdd + delim + str(srcAddr.get("streetName")) 
       flatAdd = flatAdd + delim + str(srcAddr.get("postalCode")) 
       flatAdd = flatAdd + delim + str(srcAddr.get("CountryCode")) 
      else: 
       flatAdd = delim + "None" + delim + "None" + delim +"None" + delim +"None" + delim +"None" 

      outLine = outLine + FlatAdd 

      sys.stdout.write(("%s\n" % (outLine)).encode('utf-8')) 
    except: 
     sys.stdout.write("Error Processing record\n")

所以一切工作，直到它到達StreetName，它在UnicodeDecodeError中崩潰，這是非ASCII字符開始顯示的位置。

我可以通過添加.encode修復實例（ 'UTF-8'）：

flatAdd = flatAdd + delim + str(srcAddr.get("streetName").encode('utf-8'))

但然後將其與UnicodeDecodeError錯誤下一行崩潰：

outLine = outLine + FlatAdd

我已經在這一類問題上徘徊了一個月。任何反饋將不勝感激！

來源

2013-03-26 user1826936

[如何阻止疼痛？]（http://nedbatchelder.com/text/unipain.html） – 2013-03-26 20:25:08

Robᵩ，謝謝！看到字節後我感覺像Neo。 – user1826936 2013-03-28 14:29:44

這可能會解決您的問題。我說的可能是因爲編碼有時會讓奇怪的東西發生;）

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

text_file_utf8 = text_file.encode('utf8')

從這一點上你應該擺脫的信息。如果不是這樣，請提供有關您的文件類型，語言的反饋。也許有些文件頭數據。

text_file.decode("ISO-8859-1")也可能是一個解決方案。

如果全部失敗，請在此處查看codecs(); http://docs.python.org/2/library/codecs.html

with codecs.open('your_file.extension', 'r', 'utf8') as indexKey: 
    pass 
    # Your code here

來源

2013-03-26 20:11:58

從Robᵩ（http://nedbatchelder.com/text/unipain.html）的表現與我的理解unicode的真正幫助。高度推薦給任何有unicode問題的人。

我帶走：

轉換寄託都以Unicode作爲你攝取到你的應用程序。
在您的代碼中只使用unicode字符串
在您輸出應用程序中的數據時指定編碼。

對於我來說，我是從標準輸入和文件輸出讀取到stdout：

對於標準輸入：

inData = codecs.getreader('utf-8')(sys.stdin)

一個文件：

inData = codecs.open("myFile","r","utf-8")

標準輸出（在寫任何東西到標準輸出之前做一次）：

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

來源

2013-03-28 14:37:18 user1826936

UnicodeDecodeError：'ascii'編解碼器無法解碼字節... Python 2.7和

回答

相關問題