2015-09-07 84 views
0

我想要VB.NET的正則表達式來刪除字符串中的所有超鏈接,包括協議https和http,完整文檔名稱,子域名,查詢字符串參數,因此所有鏈接都是這樣的:用VB.NET替換字符串中的所有超鏈接

下面是我在所有環節需要拆除工作的字符串:

Dim description As String 

description = "Deep purples blanket/wrap. It is gorgeous" & _ 
"in newborn photography. " & _ 
"layer" & _ 
"beneath the baby.....the possibilities are endless!" & _ 
"You will get this prop! " & _ 
"Gorgeous images using Lavender as a basket filler " & _ 
"Photo by Benbrook, TX" & _ 
"Imaging, Ontario" & _ 
"http://www.photo.com?t=3" & _ 
" www.photo.com" & _ 
" http://photo.com" & _ 
" https://photo.com" & _ 
" http://www.photo.nl?t=1&url=5" & _ 
"Photography Cameron, NC" & _ 
"Thank you so much ladies!!" & _ 
"The flower halos has beautiful items!" & _ 
"http://www.enchanting.etsy.com" & _ 
"LIKE me on FACEBOOK for coupon codes, and to see my full product line!" & _ 
"http://www.facebook.com/byme" 

我現在擁有的一切:

description = Regex.Replace(description, _ 
        "((http|https|ftp)\://[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-‌​zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)", "") 

它取代了大多數的鏈接,但沒有協議的鏈接,如www.example.com

我如何改變我的表達以包含這些鏈接?

+3

Downvote的原因:擁有超過1,600的聲望,你應該知道[問]。提示:顯示你到目前爲止所嘗試過的。 –

回答

4

您可以將字符串拆分爲Split(),然後檢查每個元素。如果它可以被解析爲絕對URI,從陣列丟棄它,然後重新建立字符串:

Dim urlStr As String 
Dim resultUri As Uri 
urlStr = "Beautiful images using Lavender, see https://www.foo.com" & vbCrLf & _ 
    "Plent of links http://www.foo.com/page.html?t=7 Oshawa, Ontario" & vbCrLf & _ 
    "http://www.example.com" & vbCrLf & "Photography, NC" 

Dim resNoURL = String.Join(" ", urlStr.Split().Select(Function(m As String) 
         If Uri.TryCreate(m, UriKind.Absolute, resultUri) = False Then 
          Return m 
         End If 
         End Function).ToList()) 

結果:

enter image description here

可選地,檢查是否m開始於http://https://。你甚至可以使用正則表達式檢查:

Dim rx As Regex = New Regex("(?i)^(?:https?|ftps?)://") 

然後在回調:

If rx.IsMatch(m) = False Then 
    Return m 
End If 

UPDATE

這裏是一個sample code從字符串中刪除網址:

Dim urlStr As String 
urlStr = "YOUR STRING" 
Dim MyRegex As Regex = New Regex("(?:(http|https|ftp)://|www\.)[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9._?,'/\\+&%$#=~-])*") 
Console.WriteLine(MyRegex.Replace(urlStr, "")) 
+0

根據這裏它不應該被要求首先拆分字符串:http://stackoverflow.com/a/6811780/769449 我已經更新了我現在使用的正則表達式的問題,但我現在缺少鏈接中沒有協議的一部分......你能幫忙嗎? – Flo

+1

你是否想用你所使用的正則表達式檢測'www.something.com'等鏈接?試試'((?:(http | https | ftp):// | www \。)[a-zA-Z0-9.-] +(\。[a-zA-Z] {2,3})? (:[A-ZA-Z0-9] *)/([A-ZA-Z0-9 ._,'/ \\ +&%$#=〜 - ])*)'?。這是相同的正則表達式,我只是添加www並刪除不必要的轉義。 –

+0

是的,我也想檢測這些鏈接,但你的正則表達式不會取代我的samle字符串中的鏈接,我剛添加... – Flo