Python 2.7：測試字符串中的字符是否都是中文字符

以下代碼測試字符串中的字符是否都是中文字符。它適用於Python 3，但不適用於Python 2.7。我如何在Python 2.7中做到這一點？Python 2.7：測試字符串中的字符是否都是中文字符

for ch in name: 
    if ord(ch) < 0x4e00 or ord(ch) > 0x9fff: 
     return False

來源

2013-05-08 Sugar Tang

'name'是一個unicode字符串還是一個字節字符串？你不必在這裏使用'ord'，btw：'如果ch u'\ u9fff'：'也可以。 – 2013-05-08 13:11:35

相關：http://stackoverflow.com/questions/16027450/is-there-a-way-to-know-whether-a-unicode-string-contains-any-chinese-japanese-ch/16028174#16028174 – Daenyth 2013-05-08 13:14:22

# byte str (you probably get from GAE) 
In [1]: s = """Chinese (漢語/漢語 Hànyǔ or 中文 Zhōngwén) is a group of related 
     language varieties, several of which are not mutually intelligible,""" 

# unicode str 
In [2]: us = u"""Chinese (漢語/漢語 Hànyǔ or 中文 Zhōngwén) is a group of related 
     language varieties, several of which are not mutually intelligible,""" 

# convert to unicode using str.decode('utf-8')  
In [3]: print ''.join(c for c in s.decode('utf-8') 
        if u'\u4e00' <= c <= u'\u9fff') 
漢語漢語中文 

In [4]: print ''.join(c for c in us if u'\u4e00' <= c <= u'\u9fff') 
漢語漢語中文

，以確保所有的字符是中文，應該這樣做：

all(u'\u4e00' <= c <= u'\u9fff' for c in name.decode('utf-8'))

在你的蟒蛇a應用，在內部使用unicode - 提前解碼&編碼 - 創建一個unicode sandwich。

來源

2013-05-08 13:32:51 root

只有一個註釋 - 不是解碼爲nonce值，最好是存儲解碼的unicode對象，並在unicode內部工作。 – Marcin 2013-05-08 13:49:31

@Marcin - 你說的沒錯，會添加一個註釋，謝謝。 – root 2013-05-08 13:50:37

這工作正常，我在Python 2.7，提供name是unicode()值：

>>> ord(u'\u4e00') < 0x4e00 
False 
>>> ord(u'\u4dff') < 0x4e00 
True

您不必如果直接使用Unicode值進行比較的字符使用ord這裏：

>>> u'\u4e00' < u'\u4e00' 
False 
>>> u'\u4dff' < u'\u4e00' 
True

來自傳入請求的數據還沒有被解碼t o unicode，你需要先做。明確設置你的表單標籤的accept-charset屬性，以確保瀏覽器使用正確的編碼：

<form accept-charset="utf-8" action="...">

然後在服務器端對數據進行解碼：

name = self.request.get('name').decode('utf8')

來源

2013-05-08 13:14:43

我我正在用Python編寫Google App Engine。 'name'是通過'name = self.request.get（'name'）'從表單獲得的，用戶只需要輸入中文字符。我是否需要將'name'轉換爲unicode？如何？ – 2013-05-08 13:26:20

@唐：是的，你必須首先將數據轉換爲Unicode。瀏覽器通常使用HTML頁面的編碼，所以如果你用'Content-Type：text/html; charset = utf8「，那麼你可以假設你也可以解碼爲UTF-8。 – 2013-05-08 14:42:03

Python 2.7：測試字符串中的字符是否都是中文字符

回答

相關問題