我試圖讓「名稱」,並從下面的HTML文件「EMAIL」文本:如何使用的XDocument和擴展方法來獲得從XML文檔內文
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<ol>
<li>
<font class="normal">
<b>NAME</b> <a href="/member/mail_compose.aspx?id=name"><img src="/images/mailbox.gif" border="0" alt="Send Mail" /></a> <a href="/photos/member_viewphoto.aspx?id=name"><img src="/images/icons/member_photos.gif" border="0" alt="View Photos" /></a> <br />
ADDRESS<br />
PHONE<br />
<a href="mailto:[email protected]" class="redlink">EMAIL</a><br />
<br />
</font>
</li>
</body>
</html>
這裏是代碼我使用:
// Load the xml document
XDocument xDoc = XDocument.Load(@"..\..\Directory.html");
// Parse document
var names = xDoc.Root.DescendantsAndSelf()
.Where(x => x.Name.LocalName == "ol").DescendantsAndSelf()
.Where(x => x.Name.LocalName == "li").DescendantsAndSelf()
.Select(x => new
{
name = x.Elements().Where(y => y.Name.LocalName == "b").Select(y => y.Value),
email = x.DescendantsAndSelf().Where(y => y.Name.LocalName == "a" && x.FirstAttribute.Name == "href" && x.Attribute("href").Value.Contains("mailto")).Select(y => y.Value ?? "No Email")
}
);
// Print text to console
for (int i = 0; i < names.Count(); i++)
{
Console.WriteLine("{0}: {1}", names.ElementAt(i).name, names.ElementAt(i).email);
}
不知何故,上面的代碼是印刷本:
System.Linq.Enumerable + WhereSelectEnumerableIterator
2[System.Xml.Linq.XElement, System.String]: System.Linq.Enumerable+WhereSelectEnumerableIterator
2 System.Xm l.Linq.XElement,System.String]
可能有人請告訴我爲什麼發生這種情況?另外,如果有更好的方法,建議將會非常受歡迎。
此答案也適用,但我將其他正確答案標記爲答案,因爲它是首先發布的。謝謝您的回答。 – Tom 2014-11-10 16:33:28