如何找到網站鏈接喜歡用正則表達式

我想要找到網站的鏈接喜歡這裏的正則表達式選項：

www.yahoo.com 
yahoo.com 
http://www.yahoo.com 
http://yahoo.com 
yahoo.jp (or any domain) 
http://yahoo.fr

反正是有跟蹤它們全部用正則表達式？

來源

2010-08-01 pedram

從daringfireball.net這個正則表達式應該能夠做你想要的。我不確定domain.tld，因爲這很不明確。

(?xi) 
\b 
(       # Capture 1: entire matched URL 
    (?: 
    [a-z][\w-]+:    # URL protocol and colon 
    (?: 
     /{1,3}      # 1-3 slashes 
     |        # or 
     [a-z0-9%]      # Single letter or digit or '%' 
            # (Trying not to match e.g. "URI::Escape") 
    ) 
    |       # or 
    www\d{0,3}[.]    # "www.", "www1.", "www2." … "www999." 
    |       # or 
    [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash 
) 
    (?:       # One or more: 
    [^\s()<>]+      # Run of non-space, non-()<> 
    |        # or 
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels 
)+ 
    (?:       # End with: 
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels 
    |         # or 
    [^\s`!()\[\]{};:'".,<>?«»「」‘’]  # not a space or one of these punct chars 
) 
)

有關它做什麼檢查http://daringfireball.net/2010/07/improved_regex_for_matching_urls

來源

2010-08-01 11:29:41 adamse

我用那個，魔杖作品找到。但有點問題，我怎麼找到返回的文本？我使用了 MatchCollection mc18 = Regex.Matches（text，regexOption，RegexOptions.IgnoreCase）; 我應該怎麼做才能找到文本？關於 – pedram 2010-08-01 11:39:21

您是要替換這些事件還是僅僅希望找到它們？ – adamse 2010-08-01 11:42:15

也是一個問題，我怎麼能跟蹤如果鏈接{} 像 {} www.yahooo.com或 {} www.yahooo.com 之間的問候 – pedram 2010-08-01 11:52:11

我要在這裏扔出去的替代，都沒有正則表達式的更多的細節。看看在HTML Agility Pack，你的情況應該是這樣的：

var doc = new HtmlDocument(); 
doc.Load("file.htm"); 
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[contains(@href, 'yahoo')]"]) 
{ 
    var href = link["href"]; 
    //href is a url that contains the word `yahoo`, do something with it 
}

它並不真正回答爲你寫的問題是，只是要保持你的選擇餘地，如RegEx can have many other problems when applied against HTML。

來源

2010-08-01 11:44:46

如何找到網站鏈接喜歡用正則表達式

回答

相關問題