在python正則表達式

捕捉重複組我有一個郵件日誌文件，該文件是這樣的：在python正則表達式

Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff

我要的是所有的郵件主機中含有「SM-MTA」線列表。在這種情況下，這將是：['gmail.com', 'yahoo.com', 'aol.com', 'gmail.com', gmail.com']

re.findall(r'sm-mta.*[email protected](.*?)[>, ]')將返回每個匹配行（['gmail.com','gmail.com']）

re.findall(r'[email protected](.*?)[>, ]')將返回正確的列表中只有第一臺主機，但我需要過濾了。有沒有解決這個問題的方法？

來源

2017-10-06 Daqol

你可以試試這個https://eval.in/875159 –

如果您不能使用的PyPI regex庫，你將不得不做，在兩個步驟：1）抓住與sm-mta線和2）抓住你所需要的值，喜歡的東西

進口號

txt="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
rx = r'@([^\s>,]+)' 
filtered_lines = [x for x in txt.split('\n') if 'sm-mta' in x] 
print(re.findall(rx, " ".join(filtered_lines)))

查看Python demo online。 @([^\s>,]+)模式將匹配@，並將捕獲並返回除空白以外的任何1+字符，>和,。

如果你可以使用正則表達式的PyPI庫，你可以得到你所需要

>>> import regex 
>>> x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
>>> rx = r'(?:^(?=.*sm-mta)|\G(?!^)).*[email protected]\K[^\s>,]+' 
>>> print(regex.findall(rx, x, regex.M)) 
['gmail.com', 'yahoo.com', 'aol.com,', 'gmail.com', 'gmail.com']

字符串的列表，請參閱the Python online demo和regex demo。

圖案的詳細資料

(?:^(?=.*sm-mta)|\G(?!^)) - 具有比換行字符以外的任何字符0+後sm-mta子，或者以前的比賽結束
.*[email protected]的地方一條線 - 任何0+字符比換行字符等，儘可能少的，最多的@和@本身
\K - 即放棄在CUR到目前爲止匹配的所有文字匹配的復位操作租迭代
[^\s>,]+ - 除空白，1個或多個字符，,和>

來源

2017-10-06 11:19:45

嘗試regex模塊。

x="""Aug 15 00:01:06 **** sm-mta*** to=<[email protected]>,<[email protected]>,[email protected], some_more_stuff 
Aug 16 13:16:09 **** sendmail*** to=<[email protected]>, some_more_stuff 
Aug 17 11:14:48 **** sm-mta*** to=<[email protected]>,<[email protected]>, some_more_stuff""" 
import regex 
print regex.findall(r"sm-mta.*to=\K|\G(?!^)[email protected](.*?)[>, ]", x, version=regex.V1)

輸出： ['', 'gmail.com', 'yahoo.com', 'aol.com', '', 'gmail.com', 'gmail.com']

就忽略第一個空的匹配。

https://regex101.com/r/7zPc6j/1

來源

2017-10-06 10:49:57 vks

在python正則表達式

回答

相關問題