0
我想從我的輸入使用HtmlAgilityPack刪除任何腳本。HtmlAgilityPack錯誤的解析<輸入
我輸入:
<div>If the amount<500 show results. Else do not show results.<mytag1>This is an xml element</mytag1></div><script>alert("welcome");</script>
預期結果:
<div>If the amount<500 show results. Else do not show results.<mytag1>This is an xml element</mytag1></div>
最終結果:
<div>If the amount<500 show="" results.="" else="" do="" not="" /><mytag1>This is an xml element</mytag1></div>
這裏是我的代碼
public HashSet<string> BlackList = new HashSet<string>()
{
{ "script" },
{ "iframe" },
{ "form" },
{ "head" },
{ "meta" },
{ "comment" }
};
public static string GetSafeHtmlString(string sInputString)
{
HtmlDocument doc = new HtmlDocument();
doc.OptionFixNestedTags = true;
//doc.OptionAutoCloseOnEnd = true;
doc.OptionDefaultStreamEncoding = System.Text.Encoding.UTF8;
doc.LoadHtml(HttpUtility.HtmlDecode(sInputString));
HtmlSanitizer sanitizer = new HtmlSanitizer();
sanitizer.SanitizeHtmlNode(doc.DocumentNode);
string output = null;
using (StringWriter sw = new StringWriter())
{
XmlWriter writer = new XmlTextWriter(sw);
doc.DocumentNode.WriteTo(writer);
output = sw.ToString();
if (!string.IsNullOrEmpty(output))
{
int at = output.IndexOf("?>");
output = output.Substring(at + 2);
}
writer.Close();
}
doc = null;
return output;
}
private void SanitizeHtmlNode(HtmlNode node)
{
if (node.NodeType == HtmlNodeType.Element)
{
// check for blacklist items and remove
if (BlackList.Contains(node.Name))
{
node.Remove();
return;
}
}
if (node.HasChildNodes)
{
for (int i = node.ChildNodes.Count - 1; i >= 0; i--)
{
SanitizeHtmlNode(node.ChildNodes[i]);
}
}
}
我如何得到預期的結果。 html解析器將<作爲新的html標籤的開始。如何在輸入中添加不是html標籤開頭的「<」(小於)字符。
輸入從文本區域讀取爲InnerHtml。當我調試我看到這是什麼傳遞給GetSafeHtmlString方法<div>如果金額< 500顯示結果。否則不顯示結果。 < MyTag1中>這是一個XML元素</MyTag1中> < /格> <腳本>警報("歡迎"); </script >。那意味着它已經逃脫了,對嗎? – sreddy
如果我在原始輸入中使用HttpUtility.HtmlEncode,則