2014-05-01 34 views
0

我有以下文字:定期爲一組重複數字的表達

LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514 
192 generic St. 
Newton MA 02471 
Status: Attender Marital: Married Adult: M/F: Env.No.: 


LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888 
10 generic St. 
Newton MA 02471 

    E-mail : [email protected] 
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5 

我需要獲得的電話號碼之後的文本,但也可以有家庭電話,手機,緊急電話,傳真或以不同的順序工作電話。是否有任何正則表達式,可以給我的最後一個電話號碼後的文本?,我的意思是在第二個文本塊獲得Cell Phone: 888-888-888之後的文本?

+1

有總是最後一個電話號碼後換行? – Amazingred

+0

是的,實際上不同之處在於數字後面的'\ n'。 – Leonardo

回答

2
In [1]: import re 

In [2]: s=""" LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514 
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5""" ...: 192 generic St. 
    ...: Newton MA 02471 
    ...: Status: Attender Marital: Married Adult: M/F: Env.No.: 
    ...: 
    ...: 
    ...: LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888 
    ...: 10 generic St. 
    ...: Newton MA 02471 
    ...: 
    ...:  E-mail : [email protected] 
    ...: Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5""" 

In [3]: 

In [4]: re.findall('[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)', s, re.MULTILINE) 
Out[4]: ['192 generic St. ', '10 generic St. '] 

NODE   EXPLANATION 
----------------------------------------------------- 
    [0-9]{3}  any character of: '0' to '9' (3 times) 
----------------------------------------------------- 
    -   '-' 
----------------------------------------------------- 
    [0-9]{3}  any character of: '0' to '9' (3 times) 
----------------------------------------------------- 
    -   '-' 
----------------------------------------------------- 
    [0-9]{4}  any character of: '0' to '9' (4 times) 
----------------------------------------------------- 
    \n   '\n' (newline) 
----------------------------------------------------- 
    (   group and capture to \1: 
----------------------------------------------------- 
    .*   any character except \n (0 or more times 
       (matching the most amount possible)) 
------------------------------------------------------ 
)   end of \1 
+0

我不想要數字,我想要文本塊的最後一個電話號碼之後的文本。 – Leonardo

+0

好的,我誤解了你的問題。 – Amit

+0

@Leonardo - 現在就來看看吧。 – Amit

1

這是你想要的嗎?

doc = '''LAST_NAME_1, Firs_name_1 Home Phone: 333-336-6514 
192 generic St. 
Newton MA 02471 
Status: Attender Marital: Married Adult: M/F: Env.No.: 


LAST_NAME_2, Firs_name_2 Home Phone: 777-777-2205 Cell Phone: 888-888-8888 
10 generic St. 
Newton MA 02471 

    E-mail : [email protected] 
Status: Member Marital: Married Adult: Y M/F: M Env.No.: 5''' 

import re 

p = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*)') 

for x in p.finditer(doc): 
    print x.group(1) 

輸出是

192 generic St. 
10 generic St. 

說明

[0-9]{3}-[0-9]{3}-[0-9]{4}\n(.*) 
__________________________   <- phone number 
          __  <- newline 
          __  <- this part is group(1)