2014-07-19 124 views
1

我正在使用正則表達式匹配名稱繼續「博士」。但是,當我打印比賽時,他們打印列表,有些是空的。我正在尋找打印名稱。 代碼:蟒蛇打印正則表達式匹配產生空列表

import re 

f = open('qwert.txt', 'r') 

lines = f.readlines() 
for x in lines: 
     p=re.findall(r'(?:Dr[.](\w+))',x) 
     q=re.findall(r'(?:As (\w+))',x) 
     print p 
     print q 

qwert.txt:

Dr.John and Dr.Keel 
Dr.Tensa 
Dr.Jees 
As John winning Nobel prize 
As Mary wins all prize 
car 
tick me 3 
python.hi=is good 
dynamic 
and precise 

tickme 2 and its in it 
its rapid 
its best 
well and easy 

期望的輸出:

John 
Keel 
Tensa 
Jees 
John 
Mary 

實際輸出:

['John', 'Keel'] 
[] 
['Tensa'] 
[] 
['Jees'] 
[] 
[] 
['John'] 
[] 
['Mary'] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
[] 
+0

在期望的輸出中,「Jees」發生了什麼? –

回答

2

re.findall()總是返回匹配的列表,並且該列表可以是空的。循環結果並分別打印每個元素:

p = re.findall(r'(?:Dr[.](\w+))', x) 
for match in p: 
    print match 
q = re.findall(r'(?:As (\w+))', x) 
for match in q: 
    print q 

空列表意味着什麼都不會打印。

你甚至可以這樣做:

for match in re.findall(r'(?:Dr[.](\w+))', x): 
    print match 
for match in re.findall(r'(?:As (\w+))', x): 
    print q 

,並放棄使用pq變量。

最後但並非最不重要的,你可以在正則表達式組合成一個:

for match in re.findall(r'(?:Dr\.|As)(\w+)', x): 
    print match 

演示:

>>> import re 
>>> lines = '''\ 
... Dr.John and Dr.Keel 
... Dr.Tensa 
... Dr.Jees 
... As John winning Nobel prize 
... As Mary wins all prize 
... car 
... tick me 3 
... python.hi=is good 
... dynamic 
... and precise 
... 
... tickme 2 and its in it 
... its rapid 
... its best 
... well and easy 
... '''.splitlines(True) 
>>> for x in lines: 
...  for match in re.findall(r'(?:Dr\.|As)(\w+)', x): 
...   print match 
... 
John 
Keel 
Tensa 
Jees 
John 
Mary 
2

[]你看到的是因爲findAll返回一個字符串的list。如果您自己需要這些字符串,請遍歷findAll的結果。

p=re.findall(r'(?:Dr[.](\w+))',x) 
q=re.findall(r'(?:As (\w+))',x) 
for str in p+q: 
    print str 
2

簡單地測試了反對findall結果打印前:

import re 

with open('qwert.txt', 'r') as fh: 
    for line in fh: 
     res = re.findall(r'(?:Dr[.](\w+))', line) 
     if res: 
      print '\n'.join(res) 
     res = re.findall(r'(?:As (\w+))', line) 
     if res: 
      print '\n'.join(res) 

這會不會很好地擴展,如果正則表達式的數量大於一對夫婦更。也許更有用的方法:

import re 
from functools import partial 


def parseNames(regexs, line): 
    """ 
    Returns a newline seperated string of matches given a 
    list or regular expressions and a string to search 
    """ 
    res = "" 
    for regex in regexs: 
     res += '\n'.join(re.findall(regex, line)) 
    return res 


regexs = [r'(?:Dr[.](\w+))', r'(?:As (\w+))'] 
match = partial(parseNames, regexs) 

with open('qwert.txt', 'r') as fh: 
    names = map(match, fh.readlines()) 
    print '\n'.join(filter(None, names)) 

輸出:

John 
Keel 
Tensa 
Jees 
John 
Mary 
1

你需要通過你的結果進行迭代。

考慮使用findall()一次,因此不必在每次迭代時重複。

>>> import re 
>>> f = open('qwert.txt', 'r') 
>>> for line in f: 
...  matches = re.findall(r'(?:Dr\.|As)(\w+)', line) 
...  for x in matches: 
...   print x 

John 
Keel 
Tensa 
Jees 
John 
Mary