UnicodeEncodeError：「ASCII」編解碼器無法編碼的字符U「\ XFA」在位置42：在範圍序數不（128）

def main(): 
    client = ##client_here 
    db = client.brazil 
    rio_bus = client.tweets 
    result_cursor = db.tweets.find() 
    first = result_cursor[0] 
    ordered_fieldnames = first.keys() 
    with open('brazil_tweets.csv','wb') as csvfile: 

     csvwriter = csv.DictWriter(csvfile,fieldnames = ordered_fieldnames,extrasaction='ignore') 
     csvwriter.writeheader() 
     for x in result_cursor: 
      print x 
      csvwriter.writerow({k: str(x[k]).encode('utf-8') for k in x}) 

     #[ csvwriter.writerow(x.encode('utf-8')) for x in result_cursor ] 


if __name__ == '__main__': 
    main()

基本上問題是，鳴叫包含一串在葡萄牙的字符。我試圖通過將所有內容編碼爲unicode值來解決此問題，然後再將它們放入要添加到行中的字典中。但是這不起作用。任何其他想法來格式化這些值，以便csv閱讀器和dictreader可以閱讀它們？UnicodeEncodeError：「ASCII」編解碼器無法編碼的字符U「 XFA」在位置42：在範圍序數不（128）

來源

2015-12-01 Alex Marshall

哪一行出錯？ – tdelaney

'str（x [k]）'看起來很奇怪......如果'x [k]'是unicode在ascii範圍之外，你會得到錯誤。 – tdelaney

第14行。csvwriter.writerow部分 –

str(x[k]).encode('utf-8')是問題所在。

str(x[k])將使用默認ascii編解碼器Unicode字符串轉換爲字節串在Python 2：

>>> x = u'résumé' 
>>> str(x) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

非Unicode值，比如布爾值，將被轉換爲字節串，但隨後Python將隱含在調用.encode()之前將字節串解碼爲一個Unicode字符串，因爲您只能編碼Unicode字符串。這通常不會導致錯誤，因爲大多數非Unicode對象都具有ASCII表示。在此處，一個自定義對象返回一個非ASCII str()表示一個例子：

>>> class Test(object): 
... def __str__(self): 
... return 'r\xc3\xa9sum\xc3\xa9' 
... 
>>> x=Test() 
>>> str(x) 
'r\xc3\xa9sum\xc3\xa9' 
>>> str(x).encode('utf8') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

注意上面是一個解碼錯誤，而不是一個編碼錯誤。

如果str()是隻有強迫布爾值到字符串時，它強迫爲Unicode字符串代替：

unicode(x[k]).encode('utf-8')

非Unicode值將被轉換爲Unicode字符串，然後可以正確地進行編碼，但Unicode字符串將保持不變，所以它們也將被正確編碼。

>>> x = True 
>>> unicode(x) 
u'True' 
>>> unicode(x).encode('utf8') 
'True' 
>>> x = u'résumé' 
>>> unicode(x).encode('utf8') 
'r\xc3\xa9sum\xc3\xa9'

P.S. Python 3不會在字節和Unicode字符串之間進行隱式編碼/解碼，並使這些錯誤更容易被發現。

來源

2015-12-01 01:35:26

UnicodeEncodeError：「ASCII」編解碼器無法編碼的字符U「\ XFA」在位置42：在範圍序數不（128）

回答

相關問題