Python正則表達式：在多行上匹配一個字符？

我在pythonchallenge.com上面臨挑戰，我遇到了一般的正則表達式問題。Python正則表達式：在多行上匹配一個字符？

例如，如果我們有如下的文字：

hello world 
<!-- 
%%[email protected]_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[([email protected]%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{[email protected]#_^{* 
@##&{#&{&)*%(]{{([*}@[@&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((@^@}$[*a*$&^{[email protected]#$%)[email protected](&bc

而且我想要得到的字符a和b和c爲字符串（從上面的字符串）（但不是世界你好）如何我這樣做？

我知道我可以做在python以下：

x = "".join(re.findall("regex", data))

不過，我有正則表達式表達的問題。我正在測試它的正則表達式測試儀，它似乎並沒有做什麼，我想要它做的

這裏是我的正則表達式

<!--[a-z]*

從我的理解，（閱讀正則表達式之後.info教程）此表達式應該查找指定字符串後面的所有字符：輸出abc

但是，這不起作用。這是我的理解，這不是一個特殊的字符，因爲它不是[\^$。|？* +（）。

我該如何讓這個正則表達式爲我想要的方式工作？要包括abc但不是世界？

來源

2013-07-07 Jason

_」 <！ - [az] *這個表達式應該找到指定字符串後面的所有字符：輸出abc「_序號這表示：字符序列''<！ - ''緊隨其後的是任意字母序列，或者不遵循的明星）。爲了能夠從序列''<！ - ''運行到aznalyzed文本中的下列字母序列，您必須寫入''<！ - 。*？[a-z] *''。部分''。+？''意味着'消耗任何字符，直到碰到後面的東西，也就是說變成一個字母序列。 – eyquem

import re 

su = '''hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$&^[email protected](&bc??,=hello''' 

print su 

pat = '([a-z]+)(?![a-z])(?<!world)' 
print "\nexcluding all the words 'world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello world)' 
print "\nexcluding the word 'world' of the starting string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello world)' 
print "\nexcluding all the words 'world' of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

print '\n-----------' 

pat = '([a-z]+)(?![a-z])(?<!hello)' 
print "\nexcluding all the words 'hello'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello)' 
print "\nexcluding the starting word 'hello'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))' 
print "\nexcluding all the words 'hello' of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

print '\n-----------' 

pat = '([a-z]+)(?![a-z])(?<!hello|world)' 
print "\nexcluding all the words 'hello' and 'world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world)' 
print "\nexcluding all the words of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world)' 
print "\nexcluding all the words of the starting string 'hello world'\n%s" % pat 
print re.findall(pat,su)

結果

hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$&^[email protected](&bc??,=hello 

excluding all the words 'world' 
([a-z]+)(?![a-z])(?<!world) 
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello'] 

excluding the word 'world' of the starting string 'hello world' 
([a-z]+)(?![a-z])(?<!\Ahello world) 
['hello', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

excluding all the words 'world' of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello world) 
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello'] 

----------- 

excluding all the words 'hello' 
([a-z]+)(?![a-z])(?<!hello) 
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc'] 

excluding the starting word 'hello' 
([a-z]+)(?![a-z])(?<!\Ahello) 
['world', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

excluding all the words 'hello' of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello(?= world)) 
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc', 'hello'] 

----------- 

excluding all the words 'hello' and 'world' 
([a-z]+)(?![a-z])(?<!hello|world) 
['xxxx', 'yyyy', 'yuyu', 'a', 'bc'] 

excluding all the words of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world) 
['xxxx', 'yyyy', 'yuyu', 'a', 'bc', 'hello'] 

excluding all the words of the starting string 'hello world' 
([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world) 
['xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello']

如果你想後，才進行分析串在特定的模式搭上：

print su 

print "\ncatching all the lettered strings after <!--" 
print "re.compile('^.+?<!--|([a-z]+)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'world'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!world)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!world)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'hello'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!hello)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'hello' belonging to a string 'hello world'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello(?= world))',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)]

結果

hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$& <!-- ^[email protected](&bc??,=hello 

catching all the lettered strings after first <!-- 
re.compile('.+?<!--|([a-z]+)',re.DOTALL) 
['yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

catching all the lettered strings after first <!-- 
excluding all the words 'world' 
re.compile('.+?<!--|([a-z]+)(?<!world)',re.DOTALL) 
['yuyu', 'hello', 'a', 'bc', 'hello'] 

catching all the lettered strings after first <!-- 
excluding all the words 'hello' 
re.compile('.+?<!--|([a-z]+)(?<!hello)',re.DOTALL) 
['yuyu', 'world', 'a', 'bc'] 

catching all the lettered strings after first <!-- 
excluding all the words 'hello' belonging to a string 'hello world' 
re.compile('.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL) 
['yuyu', 'world', 'a', 'bc', 'hello']

來源

2013-07-07 12:12:27 eyquem

>>> import re 
>>> print strs = """hello world 
<!-- 
%%[email protected]_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[([email protected]%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{[email protected]#_^{* 
@##&{#&{&)*%(]{{([*}@[@&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((@^@}$[*a*$&^{[email protected]#$%)[email protected](&bc""" 
>>> re.findall(r'[a-zA-Z]+',strs.split('<!--')[-1]) 
['a', 'bc']

來源

2013-07-07 10:37:18

Python正則表達式：在多行上匹配一個字符？

回答

相關問題