2011-01-07 52 views
5

我想通過使用HtmlAgilityPack解析html來獲取HTML表格中的信息。c#使用HtmlAgilityPack從HTML表格中獲取數據

這裏是HTML的樣子:

... 
... 
... 
<tbody> 
        <tr> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">AA00857</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div></div> 
          <div class="style_20">TPRCF</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21"></div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21">16908/2</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">&nbsp;ETG_C</div> 
         </td> 
        </tr> 
        <tr> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">AA</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div></div> 
          <div class="style_20">TPRCF</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21"></div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21">16909/19</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">&nbsp;ETG_C</div> 
         </td> 
        </tr> 
        <tr> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">AA</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div></div> 
          <div class="style_20">TPRCF</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21"></div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_21">16907/7</div> 
         </td> 
         <td class="style_19" style="vertical-align: baseline;"> 
          <div class="style_18">&nbsp;ETG_C</div> 
         </td> 
        </tr> 
... 
... 

我需要從提取上述這些值:

AA00857, TPRCF, 16908/2, ETG_C 

到目前爲止,所有我已經是這樣的:

HtmlWeb hw = new HtmlWeb(); 
      HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.some123123site.com/index"); 



      if (htmlDoc.DocumentNode != null) 
      { 
       HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//tbody"); 

       if (bodyNode != null) 
       { 
        // Do something with bodyNode 
       } 
      } 

請幫忙!

回答

2

嘗試這種情況:

HtmlWeb hw = new HtmlWeb();    
HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.some123123site.com/index");     
if (htmlDoc.DocumentNode != null)    
{     
     foreach(HtmlNode text in htmlDoc.DocumentNode.SelectNodes("//tr/td/div/text()")) 
     {  
      Console.WriteLine(text.InnerText); 
     } 
} 
+0

錯誤\t \t 1「HtmlAgilityPack.HtmlDocument」不包含關於「DocumentElement」和沒有擴展方法「DocumentElement」接受型的第一參數「HtmlAgilityPack.HtmlDocument」可能定義被發現錯誤'HtmlAgilityPack.HtmlDocument'不包含'DocumentElement'的定義,並且沒有找到接受'HtmlAgilityPack.HtmlDocument'類型的第一個參數的擴展方法'DocumentElement' – 2011-01-07 21:49:06