確定該字符是否是python中某個單詞內的數字或Unicode字符

我想查找一個單詞是否包含數字和字符，如果是這樣則將數字部分和字符部分分開。我想檢查泰米爾文字，例如：ரூ.100或ரூ100。我想分開ரூ.和100,ரூ和100。我如何在Python中做到這一點。我想是這樣的：確定該字符是否是python中某個單詞內的數字或Unicode字符

for word in f.read().strip().split(): 
     for word1, word2, word3 in zip(word,word[1:],word[2:]): 
     if word1 == "ர" and word2 == "ூ " and word3.isdigit(): 
      print word1 
      print word2 
     if word1.decode('utf-8') == unichr(0xbb0) and word2.decode('utf-8') == unichr(0xbc2): 
      print word1 print word2

來源

2014-03-30 charvi

你嘗試過什麼？ –

我試着檢查第一個字符是否是ரூ，如果它後面跟着一個數字，但問題是我無法與unicode值匹配，則會引發錯誤。 – charvi

這是我試過的：對於word.word（1），word [2：]）中的字1，字2，字3在f.read（）。strip（）。split（）： \t： \t \t \t \t如果WORD1 == 「ர」和單詞2 == 「ூ」：＃，然後word3.isdigit（）： \t \t \t \t \t打印WORD1 \t \t \t \t \t打印WORD2 \t \t \t \t如果word1.decode（'utf-8'）== unichr （0xbb0）和word2.decode（ 'UTF-8'）== unichr（0xbc2）： \t \t \t \t \t打印WORD1 \t \t \t \t \t打印WORD2 – charvi

您可以使用(.*?)(\d+)(.*)正則表達式，這將節省3組，之前的一切數字，數字和後一切：

>>> import re 
>>> pattern = ur'(.*?)(\d+)(.*)' 
>>> s = u"ரூ.100" 
>>> match = re.match(pattern, s, re.UNICODE) 
>>> print match.group(1) 
ரூ. 
>>> print match.group(2) 
100

或者，你可以解開匹配組到變量，像這樣：

>>> s = u"100ஆம்" 
>>> match = re.match(pattern, s, re.UNICODE) 
>>> before, digits, after = match.groups() 
>>> print before 

>>> print digits 
100 
>>> print after 
ஆம்

希望有所幫助。

來源

2014-03-30 07:25:20 alecxe

我嘗試了你說的第一個模式匹配，它的工作原理...謝謝你。我也嘗試另一個。 – charvi

非常感謝你！第二個你說的作品呢！ – charvi

使用Unicode屬性：

\pL代表了一封信，任何語言在任何語言中的數字
\pN看臺。

你的情況可能是：

(\pL+\.?)(\pN+)

來源

2014-03-30 11:06:44 Toto

這不起作用 – crorella

確定該字符是否是python中某個單詞內的數字或Unicode字符

回答

相關問題