HTML屬性剝離器

我想使用C＃和RegEx去除HTML字符串中的所有屬性（及其值）。HTML屬性剝離器

例如：

<p>This is a text</p><span class="cls" style="background-color: yellow">This is another text</span>

將成爲

<p>This is a text</p><span>This is another text</span>

另外，我需要刪除所有屬性，無論是否它們的值用引號括起來。

即

<p class="cls">Some content</p> 
<p class='cls'>Some content</p> 
<p class=cls>Some content</p>

應該都會導致

<p>Some content</p>

我不能用HTMLAgilityPack由於安全方面的原因，所以我需要做到這一點使用正則表達式。

來源

2013-09-05 user2751130

可能重複http://stackoverflow.com/ question/1732348/regex-match-open-tags-except-xhtml-self-contained-tags） –

可能你會在這個網頁找到你的答案：http://stackoverflow.com/questions/2994448/regex-strip-html -一個ttributes-except-src – pardeew

'由於安全原因，我無法使用HTMLAgilityPack'你能解釋更多關於這個嗎？ – aloisdg

我有一個沒有正則表達式的解決方案。我們正在使用SubString()和IndexOf()的組合。我不檢查任何錯誤。這只是一個想法。

Working Demo

C＃：

private static void Main(string[] args) 
{ 
    string s = @"<p>This is a text</p><span class=""cls"" style=""background-color: yellow"">This is another text</span>"; 

    var list = s.Split(new[] {"<"}, StringSplitOptions.RemoveEmptyEntries); 
    foreach (var item in list) 
     Console.Write(ClearAttributes('<' + item)); 
    Console.ReadLine(); 
} 

private static string ClearAttributes(string source) 
{ 
    int startindex = source.IndexOf('<'); 
    int endindex = source.IndexOf('>'); 
    string tag = source.Substring((startindex + 1), (endindex - startindex - 1)); 
    int spaceindex = tag.IndexOf(' '); 
    if (spaceindex > 0) 
     tag = tag.Substring(0, spaceindex); 
    return String.Concat('<', tag, source.Substring(endindex)); 
}

輸出：

<p>This is a text</p><span>This is another text</span>

[除XHTML自包含標籤的正則表達式匹配開放標籤]（的

來源

2015-02-01 20:07:37 aloisdg

HTML屬性剝離器

回答

相關問題