Unicode re.sub（）不適用於\ g <0>（組0）

爲什麼\g<0>不能用於unicode正則表達式？Unicode re.sub（）不適用於 g <0>（組0）

當我試圖用\g<0>之前與普通的字符串正則表達式組之後插入一個空間，它的工作原理：

>>> punct = """,.:;[email protected]#$%^&*(){}{}|\/?><"'""" 
>>> rx = re.compile('[%s]' % re.escape(punct)) 
>>> text = '''"anständig"''' 
>>> rx.sub(r" \g<0> ",text) 
' " anst\xc3\xa4ndig " ' 
>>> print rx.sub(r" \g<0> ",text) 
" anständig "

但使用Unicode正則表達式，空間不會添加：

>>> punct = u""",–−—’‘‚」「‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|""" 
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE) 
>>> text = """„anständig「""" 
>>> rx.sub(ur" \g<0> ", text) 
'\xe2\x80\x9eanst\xc3\xa4ndig\xe2\x80\x9c' 
>>> print rx.sub(ur" \g<0> ", text) 
„anständig「

我如何獲得\g採用Unicode正則表達式的工作？
如果（1）不可能，我怎樣才能得到unicode regex輸入punct中的一個字符前後的空格？

來源

2013-10-17 alvas

我想你有兩個錯誤。首先，您不會像第一個示例中的re.escape那樣轉義punct，並且您有需要轉義的字符，如[]。其次，text變量不是unicode。例如：

>>> punct = re.escape(u""",–−—’‘‚」「‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|""") 
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE) 
>>> text = u"""„anständig「""" 
>>> print rx.sub(ur" \g<0> ", text) 
„ anständig 「

來源

2013-10-17 13:22:38 moliware

Unicode re.sub（）不適用於\ g <0>（組0）

回答

相關問題