如果列表中的字符串失敗，請使用變音符號

因爲我是法國人，我正在嘗試製作一個小功能，可以在國名前添加好的定冠詞。除了少數以變音符開頭的國家外，我沒有任何問題。這裏是我的代碼：如果列表中的字符串失敗，請使用變音符號

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 
def article(nomPays): 
    voyelles = ['A','E','É','I','O','U','Y'] 
    if nomPays == 'Mexique': 
     return 'du' 
    elif nomPays[0] in voyelles: 
     return 'de l\'' 
    elif nomPays[-1] == 'e':#signe négatif pour compter à partir de la dernière lettre 
     return 'de la' 
    else: 
     return 'du' 

print article('Érythrée')

如果我進入ALLEMAGNE而不是Érythrée，該行爲是正確的：它返回「德升」。但埃裏特里返回'德拉'。這意味着我的功能不能識別字符É作爲voyelles列表的一部分。

任何人都可以解釋我爲什麼以及如何解決這個問題嗎？

來源

2012-08-02 hyogapag

強制閱讀：[Python的Unicode HOWTO]（http://docs.python.org/howto/unicode.html）。 – 2012-08-02 10:10:46

當然還有[Joel Spolsky在Unicode上的經典]（http://www.joelonsoftware.com/articles/Unicode.html）。 – 2012-08-02 10:12:47

的問題是，你在Python 2，其中str是一個字節序列使用str等nomPays[0]會給字符串，而不是第一個字符的第一字節。在單字節編碼中，這不是問題，但是對於像UTF-8這樣的多字節編碼，「Erythrée」的第一個字節是前導字節，而不是整個字符「É」。

你需要改變使用unicode搶到的第一個字符：

firstChar = unicode(nomPays, 'UTF-8')[0].encode('UTF-8')

事實上，它可能會更容易地使用startswith：

if any(nomPays.startswith(voyelle) for voyelle in voyelles):

另外，您可以在整個使用unicode您應用程序，或切換到Python 3，所有這些處理好得多。

來源

2012-08-02 10:05:13 ecatmur

非常清楚準確的答案。現在，我將使用'startwith'，但我會考慮切換到Python 3.並且還要感謝@martjin的閱讀建議（已爲第一個做過）。 – hyogapag 2012-08-02 10:24:30

添加u之前''：

voyelles = [u'A',u'E',u'É',u'I',u'O',u'U',u'Y'] 
... 
print article(u'Érythrée')

實施例：

>>> voyelles = [u'A',u'E',u'É',u'I',u'O',u'U',u'Y'] 
>>> s=u'Érythrée' 
>>> s[0] in voyelles 
True

來源

2012-08-02 10:06:58

這是一個字節字符串，而不是一個Unicode字符串，因此該字符串的第一個元素是：

>>> 'Érythrée'[0] 
'\xc3'

這是因爲UT8編碼。

來源

2012-08-02 10:07:16 mhawke

如果列表中的字符串失敗，請使用變音符號

回答

相關問題