我正在搭建一個博客引擎XSS安全的註釋。嘗試了很多不同的方法,但發現非常困難。從AntiXSS v3輸出中淨化html編碼的文本(#decimal notation)
當我顯示評論時,我首先使用Microsoft AntiXss 3.0來對html進行編碼。然後,我嘗試使用白名單方法對html安全標籤進行解碼。
一直在尋找在阿特伍德的「sanitize HTML」線程在refactormycode。
我的問題是,AntiXss庫將值編碼爲& #DECIMAL;記譜法,我不知道如何重寫史蒂夫的例子,因爲我的正則表達式知識是有限的。
我試了下面的代碼,我簡單地將實體替換爲小數形式,但它不能正常工作。
< with <
> with >
我重寫:
class HtmlSanitizer
{
/// <summary>
/// A regex that matches things that look like a HTML tag after HtmlEncoding. Splits the input so we can get discrete
/// chunks that start with < and ends with either end of line or >
/// </summary>
private static Regex _tags = new Regex("<(?!>).+?(>|$)", RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled);
/// <summary>
/// A regex that will match tags on the whitelist, so we can run them through
/// HttpUtility.HtmlDecode
/// FIXME - Could be improved, since this might decode > etc in the middle of
/// an a/link tag (i.e. in the text in between the opening and closing tag)
/// </summary>
private static Regex _whitelist = new Regex(@"
^</?(a|b(lockquote)?|code|em|h(1|2|3)|i|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$
|^<(b|h)r\s?/?>$
|^<a(?!>).+?>$
|^<img(?!>).+?/?>$",
RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture | RegexOptions.Compiled);
/// <summary>
/// HtmlDecode any potentially safe HTML tags from the provided HtmlEncoded HTML input using
/// a whitelist based approach, leaving the dangerous tags Encoded HTML tags
/// </summary>
public static string Sanitize(string html)
{
string tagname = "";
Match tag;
MatchCollection tags = _tags.Matches(html);
string safeHtml = "";
// iterate through all HTML tags in the input
for (int i = tags.Count - 1; i > -1; i--)
{
tag = tags[i];
tagname = tag.Value.ToLowerInvariant();
if (_whitelist.IsMatch(tagname))
{
// If we find a tag on the whitelist, run it through
// HtmlDecode, and re-insert it into the text
safeHtml = HttpUtility.HtmlDecode(tag.Value);
html = html.Remove(tag.Index, tag.Length);
html = html.Insert(tag.Index, safeHtml);
}
}
return html;
}
}
我的輸入測試HTML是:
<p><script language="javascript">alert('XSS')</script><b>bold should work</b></p>
AntiXss後會變成:當我運行消毒的版本
<p><script language="javascript">alert('XSS')</script><b>bold should work</b></p>
(字符串html),它給了我:
<p><script language="javascript">alert('XSS')</script><b>bold should work</b></p>
正則表達式匹配我不想要的白名單中的腳本。任何幫助,將不勝感激。
只記得這一點:http://www.codinghorror.com/blog/archives/001171.html – some 2008-12-28 16:01:50
我一直都在這些鏈接的最後24小時。不能相信它必須如此複雜。正如他們在關於CSRF文章「Web開發缺乏可怕性」的評論中引用的那樣是非常真實的。 – jesperlind 2008-12-28 16:33:48
謹防白名單IMG標籤。 onerror屬性可用於插入腳本。 – PEZ 2008-12-28 16:37:29