從asp.net中刪除除<a>之外的html標記

如何清理只留下純文本和<a>元素的字符串？從asp.net中刪除除<a>之外的html標記

例子：

<table><tr><td>Hello my web is <a href="http://www.myweb.com">Myweb</a>, <span>Follow my blog!</span></td></tr></table>

結果：

Hello my web is <a href="http://www.myweb.com">Myweb</a>, Follow my blog!

感謝，

來源

2014-04-24 user2170407

如果你想通過正則表達式來做到這一點（根據你的標籤），記住這一點：規則1：不要使用RegEx來解析HTML。規則2：如果您仍想使用RegEx解析HTML，請參閱規則1. [RegEx只能匹配常規語言，而HTML不是常規語言]（http://stackoverflow.com/a/590789/930393） – freefaller

@ freefaller看起來像你在那裏與「爲了上帝的愛，沒有」建議在我面前。 :) –

非常非常哈克（和真的不應該productionally使用），但：

C＃

Regex.Replace(input, @"<[^>]+?\/?>", m => { 
    // here you can exclude specific tags such as `<a>` or maybe `<b>`, etc. 
    return Regex.IsMatch(m.Value, @"^<a\b|\/a>$") ? m.Value : String.Empty; 
});

基本上，它只是需要出與<a ...>...</a>異常每個HTML代碼。

注：這並不：

驗證，如果標籤被打開/關閉/嵌套正確。
驗證，如果<>實際上是HTML標籤（也許你的輸入在文本本身<或>？）
手柄「嵌套」 <>標籤。（如<img src="http://placeholde.it/100" alt="foo<Bar>"/>會留下的"/>剩餘輸出字符串）

下面是變成一個輔助方法同樣的事情：

// Mocks http://www.php.net/strip_tags 

/// <summary> 
/// Removed all HTML tags from the string and returned the purified result. 
/// If supplied, tags matching <paramref name="allowedTags"/> will be left untouched. 
/// </summary> 
/// <param name="input">The input string.</param> 
/// <param name="allowedTags">Tags to remain in the original input.</param> 
/// <returns>Transformed input string.</returns> 
static String StripTags(String input, params String[] allowedTags) 
{ 
    if (String.IsNullOrEmpty(input)) return input; 
    MatchEvaluator evaluator = m => String.Empty; 
    if (allowedTags != null && allowedTags.Length > 0) 
    { 
     Regex reAllowed = new Regex(String.Format(@"^<(?:{0})\b|\/(?:{0})>$", String.Join("|", allowedTags.Select(x => Regex.Escape(x)).ToArray()))); 
     evaluator = m => reAllowed.IsMatch(m.Value) ? m.Value : String.Empty; 
    } 
    return Regex.Replace(input, @"<[^>]+?\/?>", evaluator); 
} 

// StripTags(input) -- all tags are removed 
// StripTags(input, "a") -- all tags but <a> are removed 
// StripTags(input, new[]{ "a" }) -- same as above

來源

2014-04-24 12:02:01

比我的回答更好。 –

首先you can't use regex's to parse html

只是做全局替換上像</?table>|</?tr>|</?td>與任何其他標記你不不想用空串替換它們「」。

來源

2014-04-24 11:59:56

此代碼將刪除所有標籤，但<a>標籤。

 Regex r = new Regex(@"(?!</a>)(<\w+>|</\w+>)"); 
     var removedTags = r.Replace(inputString, "");

來源

2014-04-24 12:01:30 leskovar

僅供參考，您可以在'（？！）'中壓縮。但是你的正則表達式刪除了''，我不相信它應該。 – Robin

從asp.net中刪除除<a>之外的html標記

回答

相關問題