Python - Unicode - 優文庫

執行一個簡單的腳本並不像想象的那樣。Python - Unicode

notAllowed = {"â":"a", "à":"a", "é":"e", "è":"e", "ê":"e", 
       "î":"i", "ô":"o", "ç":"c", "û":"u"} 

word = "dôzerté" 
print word 

for char in word: 
    if char in notAllowed.keys(): 
     print "hooray" 
     word = word.replace(char, notAllowed[char]) 


print word 
print "finished"

輸出返回不變的話，即使它應該已經改變了「O」和「E」，以O和E，從而返回dozerte ...

任何想法？

來源

2012-03-08 Ervis Ilikeyoutoo

如何：

# -*- coding: utf-8 -*- 
notAllowed = {u"â":u"a", u"à":u"a", u"é":u"e", u"è":u"e", u"ê":u"e", 
      u"î":u"i", u"ô":u"o", u"ç":u"c", u"û":u"u"} 

word = u"dôzerté" 
print word 

for char in word: 
if char in notAllowed.keys(): 
    print "hooray" 
    word = word.replace(char, notAllowed[char]) 


print word 
print "finished"

基本上，如果你想分配一個unicode字符串的一些變量，你需要使用：

u"..." 
#instead of just 
"..."

表示的事實，這是Unicode字符串。

來源

2012-03-08 14:22:52 kgr

它可能（不是很熟悉Py3），但我試過在2.7和添加unicode標記後，它爲我工作:) – kgr 2012-03-08 14:26:34

有用的知識，感謝您的提示:) – kgr 2012-03-08 14:38:58

感謝kgr。你的修復效果很好！ :) 編輯：對不起，我的python 2.7 – 2012-03-08 14:39:09

迭代一個字符串迭代它的字節，不一定是它的字符。如果你的python源文件的編碼是utf-8，那麼len(word)將會是9個insted 7（兩個特殊字符都有一個雙字節編碼）。迭代一個unicode字符串（u"dôzerté"）迭代字符，所以應該工作。

我還建議您使用unidecode來完成您正在嘗試完成的任務嗎？

來源

2012-03-08 14:24:55 Simon

Python - Unicode

回答

相關問題