正則表達式分隔文本中沒有分隔符的url

我有一個不帶分隔符

 
https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n

這個例子只包含兩個URL，而這有幫倒忙多個URL（網址只）輸入文本都在同一行，但它可能會更多。

我試圖使用python

我試圖尋找解決方案分開的網址，進入一個列表，並嘗試了一些，但不能得到這個工作完全，因爲他們貪婪地消耗掉所有以下網址。 https://stackoverflow.com/a/6883094/659346

我意識到這可能是因爲https://...可能在URL的查詢部分可能合法允許，但在我的情況下，我願意假設它不能，並假設它發生時，它的開始下一個網址。

我也試過(http[s]://.*?)但與不?要麼使它獲得文本的整個位或只是https://

來源

2015-01-15 GP89

您需要使用positive lookahead assertion。

>>> s = "https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n" 
>>> re.findall(r'https?://.*?(?=https?://|$|\s)', s) 
['https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZg', 'https://console.developers.google.com/project/reducted/?authuser=1']

來源

2015-01-15 15:26:47

(https?:\/\/(?:(?!https?:\/\/).)*)

嘗試this.See演示。

https://regex101.com/r/tX2bH4/15

import re 
p = re.compile(r'(https?:\/\/(?:(?!https?:\/\/).)*)') 
test_str = "https://00e9e64bac25fa94607-apidata.googleusercontent.com/download/redacted?qk=AD5uMEnaGx-JIkLyJmEF7IjjU8bQfv_hZTkH_KOeaGZySsQCmdSPZEPHHAzUaUkcDAOZghttps://console.developers.google.com/project/reducted/?authuser=1\n" 

re.findall(p, test_str)

來源

2015-01-15 15:22:23 vks

如果url字符串在中間包含「http」，這將不起作用。 – mbomb007 2015-01-15 15:24:01

例子，它不會工作：http：//golang.org/pkg/net/http/ – mbomb007 2015-01-15 15:24:48

是的我寧願有'http [s]？：//'的前瞻性測試使它更多一點強大的。似乎無法解決如何將'：//'添加到您的答案，但：S – GP89 2015-01-15 15:26:44

正則表達式分隔文本中沒有分隔符的url

回答

相關問題