檢測並從字符串中提取網址？

這是一個簡單的問題，但我只是不明白。我想檢測一個字符串中的網址，並用縮短的網址替換它們。檢測並從字符串中提取網址？

我發現從計算器這個表達式，但結果卻http

Pattern p = Pattern.compile("\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]",Pattern.CASE_INSENSITIVE); 
     Matcher m = p.matcher(str); 
     boolean result = m.find(); 
     while (result) { 
      for (int i = 1; i <= m.groupCount(); i++) { 
       String url=m.group(i); 
       str = str.replace(url, shorten(url)); 
      } 
      result = m.find(); 
     } 
     return html;

有沒有更好的主意嗎？

來源

2011-04-19 Shisoft

m.group（1）爲您提供了第一個匹配組，即第一個捕獲括號。這裏是(https?|ftp|file)

您應該嘗試查看m.group（0）中是否存在某些內容，或者用圓括號括住所有模式並再次使用m.group（1）。

您需要重複查找功能以匹配下一個並使用新的組數組。

來源

2011-04-19 08:30:58

隨着周圍的整個事情（除了在開始單詞邊界）一些額外的支架應該匹配整個域名：

"\\b((https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|])"

我不認爲正則表達式整個URL雖然相匹配。

來源

2011-04-19 08:32:23

氏即使對於尾隨逗號和空格也是如此。Great – 2016-08-22 08:46:19

檢測URL並非易事。如果足夠讓你得到一個以https？| ftp | file開頭的字符串，那麼它可以很好。你的問題在於，你有一個捕獲組，()和那些只有第一部分http ...

我會使這部分是一個非捕獲組使用（？:)並將括號放在整個事情。

"\\b((?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|])"

來源

2011-04-19 08:37:28 stema

讓我繼續前言，並說我不是複雜案例的正則表達式的巨大倡導者。試圖寫出這樣的完美表達是非常困難的。 這就是說，我確實碰巧有一個用於檢測URL的，它由一個350行單元測試用例類支持。有人從一個簡單的正則表達式開始，多年來我們發展了表達式和測試用例來處理我們發現的問題。這絕對不是簡單：

// Pattern for recognizing a URL, based off RFC 3986 
private static final Pattern urlPattern = Pattern.compile(
     "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)" 
       + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*" 
       + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*[email protected]!:/{};']*)", 
     Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

下面是使用它的一個例子：

Matcher matcher = urlPattern.matcher("foo bar http://example.com baz"); 
while (matcher.find()) { 
    int matchStart = matcher.start(1); 
    int matchEnd = matcher.end(); 
    // now you have the offsets of a URL match 
}

來源

2011-04-19 08:53:08 WhiteFang34

這應該是被接受的答案。輝煌。 – Reginald 2015-06-08 13:04:34

不幸的是，這一個也匹配URL後面的一個點。 – 2015-09-03 15:31:47

不正確處理文本中的URL。先前的空白處理不正確（吞行換行符），並在URL後面接受冒號，點等。 – 2016-05-16 09:02:17

/** 
* Returns a list with all links contained in the input 
*/ 
public static List<String> extractUrls(String text) 
{ 
    List<String> containedUrls = new ArrayList<String>(); 
    String urlRegex = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)"; 
    Pattern pattern = Pattern.compile(urlRegex, Pattern.CASE_INSENSITIVE); 
    Matcher urlMatcher = pattern.matcher(text); 

    while (urlMatcher.find()) 
    { 
     containedUrls.add(text.substring(urlMatcher.start(0), 
       urlMatcher.end(0))); 
    } 

    return containedUrls; 
}

例子：

List<String> extractedUrls = extractUrls("Welcome to https://stackoverflow.com/ and here is another link http://www.google.com/ \n which is a great search engine"); 

for (String url : extractedUrls) 
{ 
    System.out.println(url); 
}

打印：

https://stackoverflow.com/ 
http://www.google.com/

來源

2015-02-01 23:17:30 BullyWiiPlaza

Downvoted是因爲應該有八個反斜槓而不是四個，把它們放在雙引號內可以減少字符串中反斜槓的數量到四個。\ \匹配單個\的正則表達式解釋將數字減少到兩個（？：// | \\\\）' – 2017-05-04 18:43:21

我只是犯了同樣的錯誤，我補充說'（？：// | \\\\ \\\\）' – 2017-05-08 18:43:05

檢測並從字符串中提取網址？

回答

相關問題