我在解析html中的輸入標籤子窗體時出現問題。我可以使用// input [@type]從根目錄解析它們,但不能作爲特定節點的子節點。使用HtmlAgilityPack解析節點的子節點的問題
下面是一些代碼,說明了這個問題:
private const string HTML_CONTENT =
"<html>" +
"<head>" +
"<title>Test Page</title>" +
"<link href='site.css' rel='stylesheet' type='text/css' />" +
"</head>" +
"<body>" +
"<form id='form1' method='post' action='http://www.someplace.com/input'>" +
"<input type='hidden' name='id' value='test' />" +
"<input type='text' name='something' value='something' />" +
"</form>" +
"<a href='http://www.someplace.com'>Someplace</a>" +
"<a href='http://www.someplace.com/other'><img src='http://www.someplace.com/image.jpg' alt='Someplace Image'/></a>" +
"<form id='form2' method='post' action='/something/to/do'>" +
"<input type='text' name='secondForm' value='this should be in the second form' />" +
"</form>" +
"</body>" +
"</html>";
public void Parser_Test()
{
var htmlDoc = new HtmlDocument
{
OptionFixNestedTags = true,
OptionUseIdAttribute = true,
OptionAutoCloseOnEnd = true,
OptionAddDebuggingAttributes = true
};
byte[] byteArray = Encoding.UTF8.GetBytes(HTML_CONTENT);
var stream = new MemoryStream(byteArray);
htmlDoc.Load(stream, Encoding.UTF8, true);
var nodeCollection = htmlDoc.DocumentNode.SelectNodes("//form");
if (nodeCollection != null && nodeCollection.Count > 0)
{
foreach (var form in nodeCollection)
{
var id = form.GetAttributeValue("id", string.Empty);
if (!form.HasChildNodes)
Debug.WriteLine(string.Format("Form {0} has no children", id));
var childCollection = form.SelectNodes("input[@type]");
if (childCollection != null && childCollection.Count > 0)
{
Debug.WriteLine("Got some child nodes");
}
else
{
Debug.WriteLine("Unable to find input nodes as children of Form");
}
}
var inputNodes = htmlDoc.DocumentNode.SelectNodes("//input");
if (inputNodes != null && inputNodes.Count > 0)
{
Debug.WriteLine(string.Format("Found {0} input nodes when parsed from root", inputNodes.Count));
}
}
else
{
Debug.WriteLine("Found no forms");
}
}
什麼是輸出:
Form form1 has no children
Unable to find input nodes as children of Form
Form form2 has no children
Unable to find input nodes as children of Form
Found 3 input nodes when parsed from root
我會想到的是,Form 1和Form既能有孩子和輸入[@type ]將能夠找到2個節點的form1和1的form2
是否有一個特定的配置設置或方法,我沒有使用,我應該是?有任何想法嗎?
感謝,
史蒂夫
對於那些誰不想更改Html Agility Pack代碼:HtmlNode.ElementsFlags.Remove(「form」); – Doug 2011-09-10 16:20:15
更多在這裏:http://stackoverflow.com/questions/4218847/htmlagilitypack-does-form-close-itself-for-some-reason – 2012-05-07 07:09:49
我希望我可以多次投票。這個「特徵」鼓勵陷入失敗的陷阱。 – MatthewMartin 2013-05-08 21:48:44