2013-07-07 60 views
1

我在pythonchallenge.com上面臨挑戰,我遇到了一般的正則表達式問題。Python正則表達式:在多行上匹配一個字符?

例如,如果我們有如下的文字:

hello world 
<!-- 
%%[email protected]_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[([email protected]%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{[email protected]#_^{* 
@##&{#&{&)*%(]{{([*}@[@&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((@^@}$[*a*$&^{[email protected]#$%)[email protected](&bc 

而且我想要得到的字符a和b和c爲字符串(從上面的字符串)(但不是世界你好)如何我這樣做?

我知道我可以做在python以下:

x = "".join(re.findall("regex", data)) 

不過,我有正則表達式表達的問題。我正在測試它的正則表達式測試儀,它似乎並沒有做什麼,我想要它做的

這裏是我的正則表達式

<!--[a-z]* 

從我的理解,(閱讀正則表達式之後.info教程)此表達式應該查找指定字符串後面的所有字符:輸出abc

但是,這不起作用。這是我的理解,這不是一個特殊的字符,因爲它不是[\^$。|?* +()。

我該如何讓這個正則表達式爲我想要的方式工作?要包括abc但不是世界?

+0

_」 <! - [az] *這個表達式應該找到指定字符串後面的所有字符:輸出abc「_序號這表示:字符序列''<! - ''緊隨其後的是任意字母序列,或者不遵循的明星)。爲了能夠從序列''<! - ''運行到aznalyzed文本中的下列字母序列,您必須寫入''<! - 。*?[a-z] *''。部分''。+?''意味着'消耗任何字符,直到碰到後面的東西,也就是說變成一個字母序列。 – eyquem

回答

2
import re 

su = '''hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$&^[email protected](&bc??,=hello''' 

print su 

pat = '([a-z]+)(?![a-z])(?<!world)' 
print "\nexcluding all the words 'world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello world)' 
print "\nexcluding the word 'world' of the starting string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello world)' 
print "\nexcluding all the words 'world' of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

print '\n-----------' 

pat = '([a-z]+)(?![a-z])(?<!hello)' 
print "\nexcluding all the words 'hello'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello)' 
print "\nexcluding the starting word 'hello'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))' 
print "\nexcluding all the words 'hello' of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

print '\n-----------' 

pat = '([a-z]+)(?![a-z])(?<!hello|world)' 
print "\nexcluding all the words 'hello' and 'world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world)' 
print "\nexcluding all the words of a string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

pat = '([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world)' 
print "\nexcluding all the words of the starting string 'hello world'\n%s" % pat 
print re.findall(pat,su) 

結果

hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$&^[email protected](&bc??,=hello 

excluding all the words 'world' 
([a-z]+)(?![a-z])(?<!world) 
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello'] 

excluding the word 'world' of the starting string 'hello world' 
([a-z]+)(?![a-z])(?<!\Ahello world) 
['hello', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

excluding all the words 'world' of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello world) 
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello'] 

----------- 

excluding all the words 'hello' 
([a-z]+)(?![a-z])(?<!hello) 
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc'] 

excluding the starting word 'hello' 
([a-z]+)(?![a-z])(?<!\Ahello) 
['world', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

excluding all the words 'hello' of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello(?= world)) 
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc', 'hello'] 

----------- 

excluding all the words 'hello' and 'world' 
([a-z]+)(?![a-z])(?<!hello|world) 
['xxxx', 'yyyy', 'yuyu', 'a', 'bc'] 

excluding all the words of a string 'hello world' 
([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world) 
['xxxx', 'yyyy', 'yuyu', 'a', 'bc', 'hello'] 

excluding all the words of the starting string 'hello world' 
([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world) 
['xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

如果你想後,才進行分析串在特定的模式搭上:

print su 

print "\ncatching all the lettered strings after <!--" 
print "re.compile('^.+?<!--|([a-z]+)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'world'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!world)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!world)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'hello'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!hello)',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello)',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

print ("\ncatching all the lettered strings after <!--\n" 
     "excluding all the words 'hello' belonging to a string 'hello world'") 
print "re.compile('^.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL)" 
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello(?= world))',re.DOTALL) 
print [x.group(1) for x in rgx.finditer(su) if x.group(1)] 

結果

hello world 
xxxx hello world yyyy 
<!-- 
_+]!yuyu*@&^}@?!hello world[@%]^@}$[*a*$& <!-- ^[email protected](&bc??,=hello 

catching all the lettered strings after first <!-- 
re.compile('.+?<!--|([a-z]+)',re.DOTALL) 
['yuyu', 'hello', 'world', 'a', 'bc', 'hello'] 

catching all the lettered strings after first <!-- 
excluding all the words 'world' 
re.compile('.+?<!--|([a-z]+)(?<!world)',re.DOTALL) 
['yuyu', 'hello', 'a', 'bc', 'hello'] 

catching all the lettered strings after first <!-- 
excluding all the words 'hello' 
re.compile('.+?<!--|([a-z]+)(?<!hello)',re.DOTALL) 
['yuyu', 'world', 'a', 'bc'] 

catching all the lettered strings after first <!-- 
excluding all the words 'hello' belonging to a string 'hello world' 
re.compile('.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL) 
['yuyu', 'world', 'a', 'bc', 'hello']