「UnicodeEncodeError：‘ASCII’編解碼器不能編碼字符」

我試圖通過正則表達式通過隨機HTML的大串和我的Python 2.6的腳本哽咽這樣的：「UnicodeEncodeError：‘ASCII’編解碼器不能編碼字符」

UnicodeEncodeError：「ASCII」編解碼器不能編碼字符

我在這個詞的末尾追溯到商標上標：Protection™ - 我期望在未來遇到類似的其他人。

有沒有一個模塊來處理非ASCII字符？或者，在Python中處理/轉義非ascii的最好方法是什麼？

謝謝！完整的錯誤：

E 
====================================================================== 
ERROR: test_untitled (__main__.Untitled) 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
    File "C:\Python26\Test2.py", line 26, in test_untitled 
    ofile.write(Whois + '\n') 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)

完整的腳本：

from selenium import selenium 
import unittest, time, re, csv, logging 

class Untitled(unittest.TestCase): 
    def setUp(self): 
     self.verificationErrors = [] 
     self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/") 
     self.selenium.start() 
     self.selenium.set_timeout("90000") 

    def test_untitled(self): 
     sel = self.selenium 
     spamReader = csv.reader(open('SubDomainList.csv', 'rb')) 
     for row in spamReader: 
      sel.open(row[0]) 
      time.sleep(10) 
      Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td") 
      Test = Test.replace(",","") 
      Test = Test.replace("\n", "") 
      ofile = open('TestOut.csv', 'ab') 
      ofile.write(Test + '\n') 
      ofile.close() 

    def tearDown(self): 
     self.selenium.stop() 
     self.assertEqual([], self.verificationErrors) 

if __name__ == "__main__": 
    unittest.main()

來源

2009-10-31 KenBurnsFan1

請發佈python版本，以及作爲異常一部分的回溯。 – gahooa 2009-10-31 00:09:28

您正在使用哪個版本的Python？在過去的幾個版本中，Python的Unicode支持有了很大的發展。 – 2009-10-31 00:10:11

以下是版本：Python 2.6 謝謝！ – KenBurnsFan1 2009-10-31 00:17:54

你試圖通過一個字符串的東西，但它是不可能的（從你提供的信息稀缺）告訴什麼你想傳遞給它。您首先使用的Unicode字符串不能編碼爲ASCII（默認編解碼器），因此，您必須使用某種不同的編解碼器進行編碼（或按照@ R.Pate的說法進行音譯） - 但不能使用比如說什麼你應該使用的編解碼器，因爲我們不知道你傳遞的是什麼字節串，因此不知道那個未知的子系統將能夠接受和正確處理編解碼器。在你們離開我們這樣的完全黑暗中，utf-8是一個合理的盲目猜測（因爲它是一種可以將任何Unicode字符串完全表示爲字符串的編解碼器，並且它是用於多種用途的標準編解碼器，例如XML） - - 但它不能僅僅是一個盲目的猜測，直到除非你要告訴我們更多關於你試圖將該字節串傳遞給什麼，以及用於什麼目的。

傳遞thestring.encode('utf-8')而不是裸thestring肯定會避免你現在所看到的特定錯誤，但它可能會導致特殊的顯示器（或是別的什麼是你試圖與該字節串做的！），除非收件人準備好了，願意並且能夠接受utf-8編碼（我們怎麼知道，對收件人可能是什麼都完全沒有想法？！ - ）

來源

2009-10-31 01:12:21

根據您的筆記更新了信息，我將開始研究如何使用utf-8 - 謝謝！ – KenBurnsFan1 2009-10-31 05:11:22

所以，現在我們知道你的錯誤是在寫入一個文件的時候發生的 - 移動到utf-8肯定會修復這個問題...但是當文件再次讀回來，它是如何處理的呢？我們仍然完全沒有認識到你的unicode的真實**目的 - >字節串轉換！ - ） – 2009-10-31 05:24:03

提供的完整腳本=一般建議也是受歡迎的。謝謝！ – KenBurnsFan1 2009-10-31 06:04:30

「最好」的方式總是取決於您的要求;那麼，你的是什麼？忽略非ASCII是否合適？你應該用「（tm）」替換？（看起來很喜歡這個例子，但是很快就出現了其他代碼點的問題，但它可能正是你想要的。）這個異常是否正是你所需要的;現在你只需要以某種方式處理它？

只有你真的可以回答這個問題。

來源

2009-10-31 00:31:55

以上更新信息 – KenBurnsFan1 2009-10-31 05:12:13

你試圖轉換到Unicode的ASCII在「嚴」模式：

>>> help(str.encode) 
Help on method_descriptor: 

encode(...) 
    S.encode([encoding[,errors]]) -> object 

    Encodes S using the codec registered for encoding. encoding defaults 
    to the default encoding. errors may be given to set a different error 
    handling scheme. Default is 'strict' meaning that encoding errors raise 
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 
    'xmlcharrefreplace' as well as any other name registered with 
    codecs.register_error that is able to handle UnicodeEncodeErrors.

你可能想是下列之一：

s = u'Protection™' 

print s.encode('ascii', 'ignore') # removes the ™ 
print s.encode('ascii', 'replace') # replaces with ? 
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities 
print s.encode('ascii', 'strict') # throw UnicodeEncodeErrors

來源

2009-10-31 00:58:40 Seth

感謝您的努力 - 我更新了我的問題，並嘗試使其與您的信息一起工作。 -KBF1 – KenBurnsFan1 2009-10-31 05:19:16

首先，嘗試英語（或者其他任何如果需要的話）安裝譯文：

sudo apt-get install language-pack-en

它提供了所有支持包（包括Python）轉換數據更新。

並確保在代碼中使用正確的編碼。

例如：

open(foo, encoding='utf-8')

然後仔細檢查像LANG值或區域（/etc/default/locale）的配置您的系統配置，不要忘記重新登錄您的會話。

來源

2015-08-13 12:11:57 kenorb

「UnicodeEncodeError：‘ASCII’編解碼器不能編碼字符」

回答

相關問題