0
This is a followup of a previous question I had.運行了一個問題,試圖從htmlnode鏈接使用htmlagiliypack
I got the very excellent link parsing code from here.
所以我有以下形式的HTML:
<html>
<head>
RANDOM JAVASCRIPT AND CSS AHHHHHH!!!!!!!!
</head>
<body>
<a href="/Random/link/here">Random</a>
<a href="/Random/link/here">Random</a>
<a href="/Random/link/here">Random</a>
<a href="/Random/link/here">Random</a>
<a href="/Random/link/here">Random</a>
<a href="/Random/link/here">Random</a>
<table class="table">
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
<tr><a href="/subdir/members/Name">Name</a></tr>
</table>
<body>
</html>
和我有下面的代碼,以創建目的是提取包含在信息中的信息,然後提取該信息的鏈接:
public class MainClass
{
public static void Main(String[] args)
{
string url = args[1];
Extractinfo pageScrape = new Extractinfo();
pageScrape.RenderPage(url);
}
}
public class Extractinfo
{
public HtmlDocument RenderPage(string url)
{
try
{
HtmlDocument pageSource = new HtmlDocument();
var webGet = new HtmlWeb();
pageSource = webGet.Load(url);
ExtractLinks(pageSource);
}
catch (WebException e)
{
Console.WrtieLine(e.Message + ": " + e.StackTrace);
}
}
private List<string> ExtractHrefTags(HtmlNode htmlSnippet)
{
List<string> hrefTags = new List<string>();
foreach (HtmlNode link in htmlSnippet.SelectNodes("//a[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
hrefTags.Add(att.Value);
}
return hrefTags;
}
public void ExtractLinks(HtmlDocument pagesource)
{
var elements = pagesource.DocumentNode.SelectNodes("//table[@class='table']");
List<string> hrefTags = new List<string>();
foreach (var ele in elements)
{
hrefTags = ExtractHrefTags(ele);
}
}
}
}
現在,代替只獲得<table class="table>*****</table>
內部的鏈接,此代碼將頁面上的所有鏈接置於List hreftags中。我在這裏做錯了什麼?我如何解決這個錯誤,以便提取的唯一鏈接是那些生活在<table class="table>*****</table>
之內的鏈接?
謝謝你的幫助!
我覺得自己像一個完全的白癡......我盯着這個好像30分鐘,想不出來。謝謝! – gfppaste