2012-09-03 110 views
1

我有頁面的字符串形式的HTML源代碼與我:相對網址替換爲絕對

<html> 
    <head> 
      <link rel="stylesheet" type="text/css" href="/css/all.css" /> 
    </head> 
    <body> 
     <a href="/test.aspx">Test</a> 
     <a href="http://mysite.com">Test</a> 
     <img src="/images/test.jpg"/> 
     <img src="http://mysite.com/images/test.jpg"/> 
    </body> 
</html> 

我想所有的相對路徑轉換爲絕對。我想輸出是:

<html> 
    <head> 
      <link rel="stylesheet" type="text/css" href="http://mysite.com/css/all.css" /> 
    </head> 
    <body> 
     <a href="http://mysite.com/test.aspx">Test</a> 
     <a href="http://mysite.com">Test</a> 
     <img src="http://mysite.com/images/test.jpg"/> 
     <img src="http://mysite.com/images/test.jpg"/> 
    </body> 
</html> 

注:我只想要相對路徑在轉換爲絕對的。已經在那個字符串中的絕對字符不應該被觸及,它們對我來說很好,因爲它們已經是絕對的了。 這可以通過正則表達式或其他方式來完成嗎?

+2

聽起來像是發現和替換就足夠了嗎? – kush

+0

我的查詢的示例代碼? –

+0

您是否嘗試在服務器上的C#中修改此字符串?或在JavaScript? – jfriend00

回答

2

添加

<base href="http://mysite.com/images/" /> 

在頁面

+0

我有一個字符串與我HTML源碼 –

+0

插入標籤,然後 – mplungjan

+0

我有字符串。我應該在哪裏插入標籤? –

0

看看這個的頭,它可以幫助你。

它是按以下格式:HTTP(S)://域(:端口)/ APPPATH

HttpContext.Current.Request.Url.Scheme + "://" + HttpContext.Current.Request.Url.Authority + HttpContext.Current.Request.ApplicationPath; 

或者你可以使用:

Page.ResolveUrl("img/youFile"); 
0

使用正則表達式爲了這。下面是簡單的例子

static void Main(string[] args) 
    { 
     string input = "<html>\n<head>\n<link rel=\"stylesheet\" type=\"text/css\" href=\"/css/all.css\" /> \n</head>\n<body>\n<a href=\"/test.aspx\">Test</a>\n<a href=\"http://mysite.com\">Test</a>\n<img src=\"/images/test.jpg\"/>\n<img src=\"http://mysite.com/images/test.jpg\"/>\n</body>\n</html>"; 
     string pattern = "((?:src|href)[\\s]*?)(?:\\=[\\s]*?[\\\"\\\'])[\\/*\\\\*]?(?!..+[s]?\\:[\\/]*)(.*?)(?:[\\s\\\"\\\'])"; 
     var reg = new Regex(pattern, RegexOptions.IgnoreCase); 
     string prefix = @"http://mysite.com"; 
     var result = reg.Replace(input, "$1=\""+prefix+"$2\""); 
    } 

結果是

<html> 
<head> 
<link rel="stylesheet" type="text/css" href="http://mysite.com/css/all.css" /> 
</head> 
<body> 
<a href="http://mysite.com/test.aspx">Test</a> 
<a href="http://mysite.com">Test</a> 
<img src="http://mysite.com/images/test.jpg"/> 
<img src="http://mysite.com/images/test.jpg"/> 
</body> 
</html> 
16

不要試圖用正則表達式這裏expained https://stackoverflow.com/a/1732454/932418https://stackoverflow.com/a/1758162/932418

使用HTML解析器像HtmlAgilityPack代替

解析HTML
string html = 
@"<html> 
    <head> 
      <link rel=""stylesheet"" type=""text/css"" href=""/css/all.css"" /> 
    </head> 
    <body> 
     <a href=""/test.aspx"">Test</a> 
     <a href=""http://example.com"">Test</a> 
     <img src=""/images/test.jpg""/> 
     <img src=""http://example.com/images/test.jpg""/> 
    </body> 
</html>"; 

StringWriter writer = new StringWriter(); 
string baseUrl= "http://example.com"; 
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(html); 

foreach(var img in doc.DocumentNode.Descendants("img")) 
{ 
    img.Attributes["src"].Value = new Uri(new Uri(baseUrl), img.Attributes["src"].Value).AbsoluteUri; 
} 

foreach (var a in doc.DocumentNode.Descendants("a")) 
{ 
    a.Attributes["href"].Value = new Uri(new Uri(baseUrl), a.Attributes["href"].Value).AbsoluteUri; 
} 

doc.Save(writer); 

string newHtml = writer.ToString(); 
+0

非常感謝你的支持!這爲我節省了數小時的時間! – CodeMilian

+1

這就是我們跳舞!非常感謝 –

0

看看這個功能:

Private Function ConvertALLrelativeLinksToAbsoluteUri(ByVal html As String, ByVal PageURL As String) 
    Dim result As String = Nothing 
    ' Getting all Href 
    Dim opt As New RegexOptions 
    Dim XpHref As New Regex("(href="".*?"")", RegexOptions.IgnoreCase) 
    Dim i As Integer 
    Dim NewSTR As String = html 
    For i = 0 To XpHref.Matches(html).Count - 1 
     Application.DoEvents() 
     Dim Oldurl As String = Nothing 
     Dim OldHREF As String = Nothing 
     Dim MainURL As New Uri(PageURL) 
     OldHREF = XpHref.Matches(html).Item(i).Value 
     Oldurl = OldHREF.Replace("href=", "").Replace("HREF=", "").Replace("""", "") 
     Dim NEWURL As New Uri(MainURL, Oldurl) 
     Dim NewHREF As String = "href=""" & NEWURL.AbsoluteUri & """" 
     NewSTR = NewSTR.Replace(OldHREF, NewHREF) 
    Next 
    html = NewSTR 
    Dim XpSRC As New Regex("(src="".*?"")", RegexOptions.IgnoreCase) 
    For i = 0 To XpSRC.Matches(html).Count - 1 
     Application.DoEvents() 
     Dim Oldurl As String = Nothing 
     Dim OldHREF As String = Nothing 
     Dim MainURL As New Uri(PageURL) 
     OldHREF = XpSRC.Matches(html).Item(i).Value 
     Oldurl = OldHREF.Replace("src=", "").Replace("src=", "").Replace("""", "") 
     Dim NEWURL As New Uri(MainURL, Oldurl) 
     Dim NewHREF As String = "src=""" & NEWURL.AbsoluteUri & """" 
     NewSTR = NewSTR.Replace(OldHREF, NewHREF) 
    Next 
    Return NewSTR 
End Function 
0

這對我的偉大工程。我在電子郵件模板上使用它。我在每個鏈接的開頭使用MVC/Razor「〜/」。

' Parse HTML and make relative links absolute with p_basepath 
Public Function ParseHTMLLinks(ByVal MailBodyHTML As String) As String 
    ' Declare & intialize variables 
    Dim strHTMLBody As String = MailBodyHTML 

    ' Set regex variables 
    Dim strSrcSubMatch As String = "" 
    Dim strSrcFullUrl As String = "" 
    Dim srcPattern As String = "[=""]\/?([^""\s]*(\.gif|\.jpg|\.jpeg|\.png|\.css|\.js))[""\s]" 
    Dim srcOptions As RegexOptions = RegexOptions.IgnoreCase 
    Dim regex As Regex = New Regex(srcPattern, srcOptions) 
    Dim regexSub As Regex = New Regex(srcPattern, srcOptions) 
    Dim Matches As MatchCollection = regex.Matches(strHTMLBody) 

    Try 
     For Each Match As Match In Matches 
      ' filter out absolute links 
      If InStr(Match.ToString, "://") = 0 And InStr(LCase(Match.ToString), "mailto:") = 0 And InStr(LCase(Match.ToString), "javascript:") = 0 Then 
       ' Remove the " at each end of relative path 
       strSrcSubMatch = regexSub.Replace(Match.ToString, "$1") 
       ' Concatenate the FullPath 
       strSrcFullUrl = p_basePath & strSrcSubMatch 
       ' Execute the replace 
       strHTMLBody = Replace(strHTMLBody, "/" & strSrcSubMatch, strSrcFullUrl) 
      End If 
     Next 

    Catch e As WebException 
     'Add errors to List(Of WebException), if any. 
     ErrorCodes.Add(e) 
    End Try 

    Return strHTMLBody 'MailBodyHTML 
End Function 
相關問題