正則表達式從字符串

提取所有的URL我有這樣正則表達式從字符串

http://example.com/path/topage.htmlhttp://twitter.com/p/xyanhshttp://httpget.org/get.zipwww.google.com/privacy.htmlhttps://goodurl.net/

一個字符串，我想所有的URL/webaddress提取到一個數組。例如

urls = ['http://example.com/path/topage.html','http://twitter.com/p/xyan',.....]

這裏是我的方法，沒有工作。

import re 
strings = "http://example.com/path/topage.htmlhttp://twitter.com/p/xyanhshttp://httpget.org/get.zipwww.google.com/privacy.htmlhttps://goodurl.net/" 
links = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[[email protected]&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', strings) 

print links 
// result always same as strings

來源

2016-08-02 hayes robin

這應該對您有所幫助：http://regex101.com。你可以在那裏玩你的正則表達式，看看你的問題可能是什麼。 – idjaw

你必須保持領先'http（s）'？ – Bahrom

是的，我將不得不@Bahrom –

問題是，您的正則表達式模式太包容。它包括所有的網址。您可以通過使用先行

試試這個（=）：

re.findall("((www\.|http://|https://)(www\.)*.*?(?=(www\.|http://|https://|$)))", strings)

來源

2016-08-02 21:39:14 Munchhausen

不捕獲'www.google.com/privacy.html'，否則沒關係 –

好點。在它上面工作。 – Munchhausen

嗨@Muchhausen，感謝它幾乎工作，除了'http：// httpget.org/get.zipwww.google.com/privacy.html'此網址。 –

您的問題是http://被接受爲URL的有效組成部分。這是因爲這個令牌就在這裏：

[[email protected]&+]

或者更具體地說：

$-_

這所有字符匹配與$到_的範圍，其中包括了更多的字符可能比你預期的要做。

您可以將其更改爲[$\[email protected]&+]，但這會導致問題，因爲現在/個字符不匹配。所以使用[$\[email protected]&+/]來添加它。但是，由於http://example.com/path/topage.htmlhttp將被視爲有效匹配，因此這將再次導致問題。

最後的補充是增加一個前視，以確保你不匹配http://或https://，這恰好是你的正則表達式的第一部分！

http[s]?://(?:(?!http[s]?://)[a-zA-Z]|[0-9]|[$\[email protected]&+/]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

測試here

來源

2016-08-02 21:46:35

簡單的回答沒有進入很多併發症：

import re 
url_list = [] 

for x in re.split("http://", l): 
    url_list.append(re.split("https://",x)) 

url_list = [item for sublist in url_list for item in sublist]

如果你想添加的字符串http://和https://回網址，做適當的修改碼。希望我傳達這個想法。

來源

2016-08-02 22:02:54

並不是所有的網址都有'http：//' –

這裏是我的

(r’http[s]?://[a-zA-Z]{3}\.[a-zA-Z0-9]+\.[a-zA-Z]+')

來源

2017-05-04 05:00:23 user3567030

雖然這段代碼片段是受歡迎的，並且可能會提供一些幫助，但如果它包含解釋，它會[大大改善]（// meta.stackexchange.com/q/114762）*如何解決問題。沒有這些，你的答案就沒有什麼教育價值了 - 記住，你正在爲將來的讀者回答這個問題，而不僅僅是現在問的人！請編輯您的答案以添加解釋，並指出適用的限制和假設。 –

正則表達式從字符串

回答

相關問題