2012-08-14 41 views
1

請幫助我從這裏使用C#.net Regex Replace方法替換所有額外的Facebook信息。Facebook的飼料 - 從錨點刪除額外的Facebook的JS

<a href="/l.php?u=http%3A%2F%2Fon.fb.me%2FOE6gnB&amp;h=yAQFjL0pt&amp;s=1" target="_blank" rel="nofollow nofollow" onmouseover="LinkshimAsyncLink.swap(this, &quot;http:\/\/on.fb.me\/OE6gnB&quot;);" onclick="LinkshimAsyncLink.swap(this, &quot;\/l.php?u=http\u00253A\u00252F\u00252Fon.fb.me\u00252FOE6gnB&amp;h=yAQFjL0pt&amp;s=1&quot;);">http://on.fb.me/OE6gnB</a>somehtml 

輸出

somehtml <a href="http://on.fb.me/OE6gnB">on.fb.me/OE6gnB</a> somehtml 

我試過以下的正則表達式,但他們並沒有爲我

searchPattern = "<a([.]*)?/l.php([.]*)?(\">)?([.]*)?(</a>)?"; 
replacePattern = "<a href=\"$3\" target=\"_blank\">$3</a>"; 

感謝

工作
+0

http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html -with-a-reg 有幾個不同的庫可以去掉來自html/xml的數據之一是http://htmlagilitypack.codeplex.com/,爲什麼試圖重新發明輪子? – 2012-08-14 08:08:33

回答

2

我能做到這一點使用正則表達式與下面的代碼

searchPattern = "<a(.*?)href=\"/l.php...(.*?)&amp;?(.*?)>(.*?)</a>"; 
      string html1 = Regex.Replace(html, searchPattern, delegate(Match oMatch) 
    { 
     return string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", HttpUtility.UrlDecode(oMatch.Groups[2].Value), oMatch.Groups[4].Value); 

    }); 
1

你可以試試這個(System.Web程序已被添加到使用System.Web.HttpUtility):

 string input = @"<a href=""/l.php?u=http%3A%2F%2Fon.fb.me%2FOE6gnB&amp;h=yAQFjL0pt&amp;s=1"" target=""_blank"" rel=""nofollow nofollow"" onmouseover=""LinkshimAsyncLink.swap(this, &quot;http:\/\/on.fb.me\/OE6gnB&quot;);"" onclick=""LinkshimAsyncLink.swap(this, &quot;\/l.php?u=http\u00253A\u00252F\u00252Fon.fb.me\u00252FOE6gnB&amp;h=yAQFjL0pt&amp;s=1&quot;);"">http://on.fb.me/OE6gnB</a>somehtml"; 
     string rootedInput = String.Format("<root>{0}</root>", input); 
     XDocument doc = XDocument.Parse(rootedInput, LoadOptions.PreserveWhitespace); 

     string href; 
     var anchors = doc.Descendants("a").ToArray(); 
     for (int i = anchors.Count() - 1; i >= 0; i--) 
     { 
      href = HttpUtility.ParseQueryString(anchors[i].Attribute("href").Value)[0]; 

      XElement newAnchor = new XElement("a"); 
      newAnchor.SetAttributeValue("href", href); 
      newAnchor.SetValue(href.Replace(@"http://", String.Empty)); 

      anchors[i].ReplaceWith(newAnchor); 
     } 
     string output = doc.Root.ToString(SaveOptions.DisableFormatting) 
         .Replace("<root>", String.Empty) 
         .Replace("</root>", String.Empty);