刪除白名單屬性

我需要從帶有標籤的字符串中刪除屬性。刪除白名單屬性

下面是C＃代碼：

strContent = Regex.Replace(strContent, @"<(\w+)[^>]*(?<=(?/?))>", "<$1$2>", 
RegexOptions.IgnoreCase);

例如，下面的代碼將取代

This is some <div id="div1" class="cls1">content</div>. This is some more <span 
id="span1" class="cls1">content</span>. This is <input type="readonly" id="input1" 
value="further content"></input>.

與

This is some <div>content</div>. This is some more <span>content</span>. This is 
<input></input>.

但我需要一個「白名單」刪除屬性時。在上面的例子中，我希望不要刪除「輸入」標籤屬性。所以我想輸出爲：

This is some <div>content</div>. This is some more <span>content</span>. This is 
<input type="readonly" id="input1" value="further content"></input>.

感謝您對此的幫助。

來源

2013-11-27 user2751130

試圖用正則表達式解析HTML是DOOMED。您是否考慮過HTML敏捷包（將HTML加載到像XmlDocument這樣的DOM）或類似的東西？強制閱讀：http://stackoverflow.com/a/1732454/23354 –

雖然我知道正則表達式註定要解析HTML。正則表達式的這個應用程序並不關心輸入是HTML。你可以用'''替換標籤'<'，然後說：「除非引用的第一個單詞是'input''，否則我想剔除每個帶引號的字符串。 – OGHaza

爲了您例如，你可以使用：

(<(?!input)[^\s>]+)[^>]*(>)

替換$1$2。

我不知道你打算如何指定白名單。如果您可以對其進行硬編碼，那麼您可以輕鬆地將更多(?!whitelistTag)添加到上面，這可以通過編程從數組中很容易地完成。

Working on RegExr

針對平時You should not parse HTML with regex，您可以改寫的問題爲：

This is a "quoted string", cull each "quoted string to its" first word unless the "string starts with" the word "string, like these last two".

你聲稱正則表達式不應該被用來解決這個問題？因爲這是完全一樣的問題。當然，HTML解析器可以用於這項工作，但它幾乎無法使用正則表達式來實現同一事物。

來源

2013-11-27 09:23:52 OGHaza

刪除白名單屬性

回答

相關問題