RegEx.Replace但在html標籤中排除匹配？

我有一個名爲HighlightKeywords的助手方法，我在查看搜索結果時使用論壇，突出顯示用戶搜索的帖子中的關鍵字。RegEx.Replace但在html標籤中排除匹配？

我遇到的問題是，例如，用戶搜索關鍵字'hotmail'，其中HighlightKeywords方法然後找到該關鍵字的匹配項，並用指定要應用的樣式的span標籤包裝它，它會查找匹配項在html定位標記中，在某些情況下還有圖像標記。因此，當我將突出顯示的帖子渲染到屏幕上時，html標籤被破壞（由於跨度被插入其中）。

這裏是我的功能：

public static string HighlightKeywords(this string s, string keywords, string cssClassName) 
    { 
     if (s == string.Empty || keywords == string.Empty) 
     { 
      return s; 
     } 

     string[] sKeywords = keywords.Split(' '); 
     foreach (string sKeyword in sKeywords) 
     { 
      try 
      { 
       s = Regex.Replace(s, @"\b" + sKeyword + @"\b", string.Format("<span class=\"" + cssClassName + "\">{0}</span>", "$0"), RegexOptions.IgnoreCase); 
      } 
      catch {} 
     } 
     return s; 
    }

會是什麼，以防止這種破壞的最佳方式？即使我只是簡單地排除在錨標記（無論是網頁還是電子郵件地址）還是圖像標記內發生的任何匹配？

來源

2011-05-24 marcusstarnes

你真的需要使用某種HTML解析器，檢查每個元素的匹配。如果包含元素能夠接受包含文本部分的樣式，則應用它。 – 2011-05-24 15:16:42

不，你不能這樣做。至少，不是以不會破裂的方式。正則表達式不能解析HTML。真的對不起。您將要過閱讀本咆哮：RegEx match open tags except XHTML self-contained tags

所以，你可能需要解析HTML（我聽到HtmlAgilityPack是好的），然後只匹配文檔的某些部分內 - 不包括錨標記等

來源

2011-05-24 15:07:11

我遇到了同樣的問題，想出了這個解決辦法

public static string HighlightKeyWords(string s, string[] KeyWords) 
    { 
     if (KeyWords != null && KeyWords.Count() > 0 && !string.IsNullOrEmpty(s)) 
     { 
      foreach (string word in KeyWords) 
      { 
       s = System.Text.RegularExpressions.Regex.Replace(s, word, string.Format("{0}", "{0}$0{1}"), System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      } 
     } 

     s = string.Format(s, "<mark class='hightlight_text_colour'>", "</mark>"); 

     return s; 
    }

看起來有點嚇人，但我耽誤html標籤的添加，直到正則表達式匹配了所有的關鍵字，在加入{ 0}和{1}放置乞討和結束html標記的持有者，而不是標記。然後我使用循環內的佔位符在末尾添加html標籤，。

如果{0}或{1}的關鍵字作爲關鍵字傳入，仍然會中斷。

來源

2013-10-17 13:24:21 R2D2

這是一個好主意。我的問題（可能與您的相似，而不是原始的海報），是我的初始文本不是html，但我想突出顯示的一些關鍵字與通過替換以前的關鍵字（例如，如果關鍵字是'a'，它將匹配' 2017-08-09 17:36:35

馬庫斯，復活這個問題，因爲它有一個沒有提到的簡單解決方案。這種情況聽起來非常類似於Match (or replace) a pattern except in situations s1, s2, s3 etc。

有關使用正則表達式來解析html的所有免責聲明，這裏有一個簡單的方法來做到這一點。

以hotmail作爲一個例子來說明該技術以最簡單的形式，這裏是我們簡單的regex：

<a.*?</a>|(hotmail)

交替的左側匹配完整<a ... </a>標籤。我們將忽略這些匹配。右側匹配hotmail並將其捕獲到第1組，並且我們知道它們是正確的hotmail，因爲它們未與左側的表達式匹配。

這個程序演示瞭如何使用正則表達式（見成績的online demo的底部）：

using System; 
using System.Text.RegularExpressions; 
using System.Collections.Specialized; 
class Program 
{ 
static void Main() { 
var myRegex = new Regex(@"<a.*?</a>|(hotmail)"); 
string s1 = @"replace this=> hotmail not that => <a href=""http://hotmail.com"">hotmail</a>"; 

string replaced = myRegex.Replace(s1, delegate(Match m) { 
if (m.Groups[1].Value != "") return "<span something>hotmail</span>"; 
else return m.Value; 
}); 
Console.WriteLine("\n" + "*** Replacements ***"); 
Console.WriteLine(replaced); 


Console.WriteLine("\nPress Any Key to Exit."); 
Console.ReadKey(); 

} // END Main 
} // END Program

參考

How to match (or replace) a pattern except in situations s1, s2, s3...

來源

2014-06-03 01:59:17 zx81

RegEx.Replace但在html標籤中排除匹配？

回答

相關問題