對於python 2.x中的unicode字符串，等效於string.ascii_letters？

在標準庫的「串」模塊，對於python 2.x中的unicode字符串，等效於string.ascii_letters？

string.ascii_letters ## Same as string.ascii_lowercase + string.ascii_uppercase

是

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

是否有類似的常數，其中將包括被認爲是在統一的一封信一切嗎？

來源

2010-01-24 emm

您可以構建自己的Unicode的大寫和小寫字母常數：

import unicodedata as ud 
all_unicode = ''.join(unichr(i) for i in xrange(65536)) 
unicode_letters = ''.join(c for c in all_unicode 
          if ud.category(c)=='Lu' or ud.category(c)=='Ll')

這使得字符串2153個字符（窄的Unicode的Python版本）。對於像letter in unicode_letters代碼這將是更快地使用一組，而不是：

unicode_letters = set(unicode_letters)

來源

2010-01-24 15:58:49

我問過這個問題的好答案。然而，我發現了另一個更適合我需求的解決方案（請參閱下面我自己的回答） – emm 2010-01-24 17:25:42

'（'Lu'，'Ll'）'ud.category.c' – jsbueno 2015-05-15 14:06:14

這將是一個非常大的常數。 Unicode目前覆蓋超過100.000個不同的字符。所以答案是否定的。

問題是爲什麼你會需要它？例如，可能有其他解決unicodedata模塊問題的方法。

更新：您可以使用所有的Unicode數據點名稱和其他信息從ftp://ftp.unicode.org/下載文件，並用它來做大量有趣的事情。

來源

2010-01-24 09:44:36

沒有字符串，但可以使用unicodedata模塊檢查字符是否爲字母，特別是其category()函數。

>>> unicodedata.category(u'a') 
'Ll' 
>>> unicodedata.category(u'A') 
'Lu' 
>>> unicodedata.category(u'5') 
'Nd' 
>>> unicodedata.category(u'ф') # Cyrillic f. 
'Ll' 
>>> unicodedata.category(u'٢') # Arabic-indic numeral for 2. 
'Nd'

Ll表示「字母，小寫」。 Lu表示「字母，大寫」。 Nd的意思是「數字，數字」。

來源

2010-01-24 10:05:15

只是爲了回答完整，這裏是所有Unicode類別的列表：http://www.fileformat.info/info/unicode/ category/index.htm – 2010-01-24 11:54:56

-1

正如前面的答案中提到的那樣，字符串的確會是的方式太長了。所以，你必須針對（a）特定的語言。
[編輯：我意識到這是我原來的預期用途，併爲大多數用途，我想。然而，在此期間，馬克Tolonen給了一個很好的回答這個問題，因爲它是問，所以我選擇了他的答案，雖然我用以下解決方案]

這是很容易與「區域設置」模塊進行：

import locale 
import string 
code = 'fr_FR' ## Do NOT specify encoding (see below) 
locale.setlocale(locale.LC_CTYPE, code) 
encoding = locale.getlocale()[1] 
letters = string.letters.decode(encoding)

「字母」是117個字符長的unicode字符串。

顯然，string.letters依賴於所選語言代碼的默認編碼，而不是語言本身。將語言環境設置爲fr_FR或de_DE或es_ES會將string.letters更新爲相同的值（因爲它們全都默認是在ISO8859-1中編碼的）。

如果將編碼添加到語言代碼（de_DE.UTF-8）中，則將使用默認編碼來代替string.letters。如果您使用了上述代碼的其餘部分，則會導致UnicodeDecodeError。

來源

2010-01-24 11:08:34 emm

對於python 2.x中的unicode字符串，等效於string.ascii_letters？

回答

相關問題