2012-04-11 57 views
1

我正在使用網絡抓取工具。以下文本顯示了此問題末尾給出的代碼結果,該代碼從頁面獲取所有hrefs的值。HTML敏捷包 - 過濾器Href值結果

我只想要得到包含docid=

的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64556e8988583f

summary_of_documents.php

的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64579b861c1d7b

的index.php?的pageid = a45475a11ec72b843d74959b60fd7bd64579e0509c7f0 & apform =司法機關

decisions.php?DOCTYPE =決定/簽名 決議&文檔ID = 1263778435388003271#SAM

decisions.php?DOCTYPE =決定/簽名 決議&文檔ID = 12637789021669321156#SAM

?DOCTYPE =決定/簽名決議&年= 1986年?個月=一月#頭

DOCTYPE =決定/簽名決議&年= 1986年&月=月#頭

下面的代碼:

 string url = urlTextBox.Text; 
     string sourceCode = Extractor.getSourceCode(url); 

     HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
     doc.LoadHtml(sourceCode); 
     List<string> links = new List<string>(); 

     if (links != null) 
     { 
      foreach (HtmlAgilityPack.HtmlNode nd in doc.DocumentNode.SelectNodes("//a[@href]")) 
      { 
       links.Add(nd.Attributes["href"].Value); 
      } 
     } 
     else 
     { 
      MessageBox.Show("No Links Found"); 
     } 

     if (links != null) 
     { 
      foreach (string str in links) 
      { 
       richTextBox9.Text += str + "\n"; 
      } 
     } 
     else 
     { 
      MessageBox.Show("No Link Values Found"); 
     } 

我怎樣才能做到這一點?

+0

我在這裏做了一些修改。請仔細檢查:) – 2012-04-18 18:26:25

回答

2

爲什麼不直接替換此:

links.Add(nd.Attributes["href"].Value); 

與此:

if (nd.Attributes["href"].Value.Contains("docid=")) 
    links.Add(nd.Attributes["href"].Value); 
+1

它完美的作品!非常感謝你! :) – guitarPH 2012-04-11 08:57:27