2017-05-14 73 views
0

得到嵌套元素的innerText我使用HtmlAgilityPack具有這種結構來獲得從表中數據:HtmlAgilityPack - 無法從表

<table> 
    <tbody class="border_tbody"> 
     <tr style="height:55px;"> 
      <th class="heading_one" colspan="2">Heading 1</th> 
      <th class="heading_two">Heading 2</th> 
      <th class="heading_three">heading 3</th> 
     </tr> 
     <tr> 
      <td class="ro"> 
       <a href="go/a/a.com" target="_blank"> 
        <img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a" title="a"> 
       </a> 
      </td> 
      <td td="" class="l no_border"> 
       <a href="go/a/a.com" target="_blank"> 
        Vendor name 
       </a> 
      </td> 
      <td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">15%</a></td> 
      <td class="l bonus_amount"> 
       <a href="go/a/a.com" class="apply_text" target="_blank"> 
        <div class="coupon_div"> 
         <span class="coupon_span"> 
          <span class="card_secondary_text">$10</span> 
         </span> 
        </div> 
       </a> 
      </td> 
     </tr> 

     <tr> 
      <td class="ro"> 
       <a href="go/a/a.com" target="_blank"> 
        <img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a" title="a"> 
       </a> 
      </td> 
      <td td="" class="l no_border"> 
       <a href="go/a/a.com" target="_blank"> 
        Vender name 
       </a> 
      </td> 
      <td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">6%</a></td> 
      <td class="l" style="text-align: center;"></td> 
     </tr> 

     <tr> 
      <td class="ro"> 
       <a href="go/a/a.com" target="_blank"> 
        <img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a a" title="a a"> 
       </a> 
      </td> 
      <td td="" class="l no_border"> 
       <a href="go/a/a.com" target="_blank"> 
        Vendor name 
       </a> 
      </td> 
      <td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">5%</a></td> 
      <td class="l bonus_amount"> 
       <a href="apply/a" class="apply_text" target="_blank"> 
        <div class="coupon_div"> 
         <span class="coupon_span"> 
          <span class="card_secondary_text">$50</span> - Apply 
         </span> 
        </div> 
       </a> 
      </td> 
     </tr> 

    </tbody> 
</table> 

我能夠得到從第二TD內部文本[2] (供應商名稱)和第三個td [3](百分比)。我遇到問題的地方在於獲取第四個td [4]的內部文本,因爲如果嵌套元素包含文本或不包含文本,嵌套元素會發生變化。

上表顯示了三種變化,這裏是我迄今爲止的代碼。

foreach (var table in webDoc.DocumentNode.SelectNodes("//table/tbody")) 
{ 
    // skip the first tr since they are headings. 
    foreach (var tr in table.SelectNodes("tr[position() > 1]")) 
    { 
     if (tr != null) 
     { 
      var vendorName = tr.SelectSingleNode("td[2]/a").InnerText.Trim(); 
      var rateOne = tr.SelectSingleNode("td[3]/a").InnerText.Trim(); 

      // Unable to get the inner text at this point 
      // var rateTwo = tr.SelectSingleNode("td[4]/a/div/span/span").InnerText.Trim(); 

     } 
    } 
} 
+0

你不能從細胞的文字,因爲沒有文字...好的,那問題是什麼?你的問題是什麼? – Andersson

回答

0

在問題中給出的HTML示例中,它看起來像第四個單元的類名始終相同。如果沒有,你可以在所有的子節點迭代尋找與美元符號開頭的文本:

HtmlDocument webDoc = new HtmlDocument(); 
webDoc.LoadHtml(html); 
foreach (var table in webDoc.DocumentNode.SelectNodes("//table/tbody")) 
{ 
    foreach (var tr in table.SelectNodes("tr[position() > 1]")) 
    { 
     if (tr != null) 
     { 
      // [1] class name in HTML sample always the same 
      var rateTwo = tr.SelectSingleNode("td[4]//span[@class='card_secondary_text']"); 
      Console.WriteLine("Method 1 Coupon: {0}", 
       rateTwo != null ? rateTwo.InnerText : "none" 
      ); 

      // [2] brute force - all descendants 
      var rateTwo2 = tr.SelectSingleNode("td[4]").Descendants(); 
      if (rateTwo2.Count() > 0) 
      { 
       foreach (var child in rateTwo2) 
       { 
        if (child.InnerText.StartsWith("$") && child.NodeType == HtmlNodeType.Element) 
         Console.WriteLine("Method 2 Coupon: {0}", child.InnerText); 
       } 
      } 
      else 
      { 
       Console.WriteLine("Method 2: No coupon"); 
      } 
     } 
    } 
} 

輸出:

Method 1 Coupon: $10 
Method 2 Coupon: $10 
Method 1 Coupon: none 
Method 2: No coupon 
Method 1 Coupon: $50 
Method 2 Coupon: $50