編碼給出「'ascii'編解碼器不能編碼字符......序號不在範圍內（128）」

我正在通過Django RSS閱讀器項目here工作。編碼給出「'ascii'編解碼器不能編碼字符......序號不在範圍內（128）」

該RSS源將讀取像「OKLAHOMA CITY（AP） - 詹姆斯哈登讓」。 RSS提要的編碼讀取編碼=「UTF-8」，所以我相信我在下面的代碼片段中將utf-8傳遞給降價。他的破折號就是它窒息的地方。

我得到Django錯誤的''ascii'編解碼器無法編碼字符u'\ u2014'在位置109：序號不在範圍（128）「這是一個UnicodeEncodeError。在通過的變量中，我看到「OKLAHOMA CITY（AP）\'James Harden」。不工作的代碼行是：

content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

我使用markdown 2.0，django 1.1和python 2.4。

什麼是編碼和解碼的魔法序列，我需要做這個工作？

（響應於普羅米修斯請求。我同意的格式幫助）

因此，在我的意見添加parsed_feed編碼行的上方smart_unicode線...

content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict') 
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

這推動問題我的models.py對我來說，我有我

def save(self, force_insert=False, force_update=False): 
    if self.excerpt: 
     self.excerpt_html = markdown(self.excerpt) 
     # super save after this

如果我改變保存方法有。 ..

def save(self, force_insert=False, force_update=False): 
    if self.excerpt: 
     encoded_excerpt_html = (self.excerpt).encode('utf-8') 
     self.excerpt_html = markdown(encoded_excerpt_html)

我得到錯誤「 'ASCII' 編解碼器不能在141位解碼字節0xe2：在範圍序數不（128）」因爲現在它讀取「\ XE2 \ X80 \ X94」其中破折號是

來源

2010-03-25 user140314

你可以請張貼回溯原樣？ – tzot 2010-03-26 12:50:20

基本上，'parsed_feed.encoding'的價值是什麼？每個機會都是'ascii'嗎？（這將解釋你的錯誤）。 – tzot 2010-03-26 12:52:30

Django provides a couple of useful functions for converting back and forth between Unicode and bytestrings:

從django.utils.encoding進口smart_unicode，smart_str

來源

2010-03-25 04:24:12 nikola

使用... 含量= smart_unicode（內容，編碼= 'UTF-8'，strings_only =假，誤差= '嚴格'）含量=含量= content.encode（parsed_feed.encoding 「xmlcharrefreplace」）推（self，force_insert = False，force_update = False）： if self.excerpt： self.excerpt_html = markdown（self.excerpt）＃超級保存之後，這個問題給我的models.py我在哪裏 def save 如果我改變保存方法具有 encoded_excerpt_html =（self.excerpt）.encode（ 'UTF-8'） self.excerpt_html =降價（encoded_excerpt_html） – user140314 2010-03-25 05:00:55

第2部分：我得到的錯誤「 'ASCII'編碼解碼器無法解碼位置141中的字節0xe2：序號不在範圍（128）中「，因爲現在它讀取了」\ xe2 \ x80 \ x94「，其中em是破折號。 – user140314 2010-03-25 05:01:21

你能否用上面的方法修改你原來的文章？如果沒有正確的格式化，閱讀起來非常困難。 – nikola 2010-03-25 08:00:51

如果您正在接收的數據實際上是以UTF-8編碼的，那麼它應該是Python中的一個字節序列 - 一個Python'str'對象2.X

您可以驗證一個斷言：

assert isinstance(content, str)

一旦你知道這是真的，你可以移動到實際的編碼。 Python不會進行轉碼 - 例如，直接從UTF-8轉換爲ASCII。首先，您需要將您的字節序列轉換成Unicode字符串，通過解碼它：

unicode_content = content.decode('utf-8')

（如果你可以信任parsed_feed.encoding，然後用這個來代替文字「UTF-8」無論哪種方式，。爲錯誤做好準備。）

然後，您可以採取的字符串，並以ASCII編碼它，代字高爲它們的XML實體等價物：

xml_content = unicode_content.encode('ascii', 'xmlcharrefreplace')

完整的方法，那麼，看起來財產以後這樣的：

try: 
    content = content.decode(parsed_feed.encoding).encode('ascii', 'xmlcharrefreplace') 
except UnicodeDecodeError: 
    # Couldn't decode the incoming string -- possibly not encoded in utf-8 
    # Do something here to report the error

來源

2011-12-30 23:14:10

我在使用zip文件寫入文件名期間遇到此錯誤。下面失敗

ZipFile.write(root+'/%s'%file, newRoot + '/%s'%file)

及以下工作

ZipFile.write(str(root+'/%s'%file), str(newRoot + '/%s'%file))

來源

2012-09-07 02:33:20 highvelcty

在非ASCII字符的unicode值上調用'str（）'會導致OP看到完全相同的錯誤。 – 2012-09-25 15:00:53

@MartijnPieters：嗨，這是你做出的一個非常重要的觀點。我可以在[精細手冊]（http://docs.python.org/2/library/functions.html#str）中找不到有關'str（）'實際執行的操作，但是我把它歸因於我Python noob不僅僅是手冊的錯誤。這裏記錄了什麼，'str（）'對參數做了什麼，'str（）'返回的是什麼？謝謝！ – dotancohen 2013-06-12 07:59:36

'str（）'返回一個*字節的字符串*;值在0到255之間的字符，通常以0-127解釋並顯示爲ASCII字符。另一方面，'unicode（）'值可以表示Unicode標準中的任何代碼點，介於0和1114111之間。因此，使用'str（unicodevalue）'將unicode轉換爲字節字符串將涉及* some *轉換。 – 2013-06-12 12:29:27

編碼給出「'ascii'編解碼器不能編碼字符......序號不在範圍內（128）」

回答

相關問題