2012-08-09 128 views
0

Python不斷返回一個帶有破碎字符的字符串。python re.sub正則表達式

蟒蛇

test = re.sub('handle(.*?)', '<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.') 
print test 

我想

<verse osisID="lol">a bunch of random text here.</verse> 

什麼我得到

<verse osisID="lol">*broken character*</verse>a bunch of random text here. 

回答

8

您應該逃避\字符或使用r''原始字符串:

>>> re.sub('handle(.*?)', r'<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.') 
'<verse osisID="lol"></verse> a bunch of random text here.' 

沒有r''原始字符串文字,反斜槓被解釋爲轉義碼。您可以雙擊反斜槓以及:

>>> '\1' 
'\x01' 
>>> '\\1' 
'\\1' 
>>> r'\1' 
'\\1' 
>>> print r'\1' 
\1 

請注意,您只更換有字handle,該.*?模式以最低的0字符匹配。刪除問號,它會符合您的預期輸出:

>>> re.sub('handle(.*)', r'<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.') 
'<verse osisID="lol"> a bunch of random text here.</verse>' 
+0

你是一個美麗的人:) – user1442957 2012-08-09 19:48:28

+0

你可能想後的空間匹配'處理「,但在下一個單詞之前,因爲這會阻止'...> br ...'你可以用'handle *(。*)'來做這個假定你只有空格(不是其他空格) – 2012-08-09 19:51:16

+0

@AndrewCox:我會用'\ s *'來匹配那裏的空白,爲什麼只限於空間? – 2012-08-09 19:54:14

0

下面的代碼的python測試3.6

import re 

test = 'a bunch of random text here.' 
resp = re.sub(r'(.*)',r'<verse osisID="lol">\1</verse>',test) 
print (resp) 

<verse osisID="lol">a bunch of random text here.</verse>