我正在使用網絡抓取工具。以下文本顯示了此問題末尾給出的代碼結果,該代碼從頁面獲取所有hrefs的值。HTML敏捷包 - 過濾器Href值結果
我只想要得到包含docid=
的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64556e8988583f
#
summary_of_documents.php
的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64579b861c1d7b
值#
的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64579e0509c7f0 & apform =司法機關
decisions.php?DOCTYPE =決定/簽名 決議&文檔ID = 1263778435388003271#SAM
decisions.php?DOCTYPE =決定/簽名 決議&文檔ID = 12637789021669321156#SAM
?DOCTYPE =決定/簽名決議&年= 1986年?個月=一月#頭
DOCTYPE =決定/簽名決議&年= 1986年&月=月#頭
下面的代碼:
string url = urlTextBox.Text;
string sourceCode = Extractor.getSourceCode(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sourceCode);
List<string> links = new List<string>();
if (links != null)
{
foreach (HtmlAgilityPack.HtmlNode nd in doc.DocumentNode.SelectNodes("//a[@href]"))
{
links.Add(nd.Attributes["href"].Value);
}
}
else
{
MessageBox.Show("No Links Found");
}
if (links != null)
{
foreach (string str in links)
{
richTextBox9.Text += str + "\n";
}
}
else
{
MessageBox.Show("No Link Values Found");
}
我怎樣才能做到這一點?
我在這裏做了一些修改。請仔細檢查:) – 2012-04-18 18:26:25