查找 - 優文庫

solution:[a-zA-Z0-9.!#$%&'*+-/=?\^_`{|}~-][email protected][a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)* is a good choice

我使用正則表達式如下面的在文件中匹配的電子郵件地址的快速匹配的電子郵件正則表達式：查找

email = re.search('(\w+-*[.|\w]*)*@(\w+[.])*\w+',line)

當類似下面的文件中使用，我的正則表達式的作品好：

[email protected] huofenggib wrong in get_gsid 
[email protected] rouni816161 wrong in get_gsid

但是當我使用它象下面這樣的文件，我的正則表達式運行不可接受緩緩道：

9b871484d3af90c89f375e3f3fb47c41e9ff22 [email protected] 
e9b845f2fd3b49d4de775cb87bcf29cc40b72529e [email protected]

而當我使用從this website正則表達式，它仍然運行非常緩慢。

我需要一個解決方案，並想知道什麼是錯的。

來源

2012-06-15 young001

定義「慢」。你確定這是正常表達是瓶頸嗎？ –

這是我的代碼http://pastebin.com/6vUBxLZV你可以試試它，當它運行時，它會停滯在正則表達式匹配中。 – young001

一般當正則表達式是緩慢的，這是由於catastrophic bactracking。這可以在下面的章節中在你的正則表達式的發生，因爲嵌套重複：

(\w+-*[.|\w]*)*

如果你可以在正則表達式的這一部分工作，從你會看到一個實質性的速度增加括號內刪除重複。

然而，你可能更好的是隻是搜索電子郵件正則表達式，看看其他人如何解決這個問題。

來源

2012-06-15 17:09:42

thx爲告訴我災難性的bactkracking :-) – young001

這是回溯問題。閱讀this article瞭解更多信息。

你可能想分裂線，並與一個包含@部分工作：

pattern = '(\w+-*[.|\w]*)*@(\w+[.])*\w+' 
line = '9b871484d3af90c89f375e3f3fb47c41e9ff22 [email protected]' 
for element in line.split(): 
    if '@' in element: 
     g = re.match(pattern, element) 
     print g.groups()

來源

2012-06-15 17:08:44 Matthias

搜索StackOverflow以查看您的問題是否已經被討論過，總是一個好主意。

Using a regular expression to validate an email address

這一個，從討論中，看起來像一個很好的對我說：

[a-zA-Z0-9.!#$%&'*+-/=?\^_`{|}~-][email protected][a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*

來源

2012-06-15 17:13:13 steveha

是的，你的作品，我的也許像其他人說的是與災難性的bactracking – young001

查找

回答

相關問題