HTML敏捷包 - 過濾器Href值結果

我正在使用網絡抓取工具。以下文本顯示了此問題末尾給出的代碼結果，該代碼從頁面獲取所有hrefs的值。HTML敏捷包 - 過濾器Href值結果

我只想要得到包含docid=

的index.php？的pageid = a45475a11ec72b843d74959b60fd7bd64556e8988583f

＃

summary_of_documents.php

的index.php？的pageid = a45475a11ec72b843d74959b60fd7bd64579b861c1d7b
值
＃

的index.php？的pageid = a45475a11ec72b843d74959b60fd7bd64579e0509c7f0 & apform =司法機關

decisions.php？DOCTYPE =決定/簽名決議&文檔ID = 1263778435388003271＃SAM

decisions.php？DOCTYPE =決定/簽名決議&文檔ID = 12637789021669321156＃SAM

？DOCTYPE =決定/簽名決議&年= 1986年？個月=一月＃頭

DOCTYPE =決定/簽名決議&年= 1986年&月=月＃頭

下面的代碼：

 string url = urlTextBox.Text; 
     string sourceCode = Extractor.getSourceCode(url); 

     HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
     doc.LoadHtml(sourceCode); 
     List<string> links = new List<string>(); 

     if (links != null) 
     { 
      foreach (HtmlAgilityPack.HtmlNode nd in doc.DocumentNode.SelectNodes("//a[@href]")) 
      { 
       links.Add(nd.Attributes["href"].Value); 
      } 
     } 
     else 
     { 
      MessageBox.Show("No Links Found"); 
     } 

     if (links != null) 
     { 
      foreach (string str in links) 
      { 
       richTextBox9.Text += str + "\n"; 
      } 
     } 
     else 
     { 
      MessageBox.Show("No Link Values Found"); 
     }

我怎樣才能做到這一點？

來源

2012-04-11 guitarPH

我在這裏做了一些修改。請仔細檢查:) – 2012-04-18 18:26:25

爲什麼不直接替換此：

links.Add(nd.Attributes["href"].Value);

與此：

if (nd.Attributes["href"].Value.Contains("docid=")) 
    links.Add(nd.Attributes["href"].Value);

來源

2012-04-11 08:38:04 McGarnagle

它完美的作品！非常感謝你！ :) – guitarPH 2012-04-11 08:57:27

HTML敏捷包 - 過濾器Href值結果

回答

相關問題