2014-09-26 54 views
-1

我在做一個應用程序爲WP 8.1,我必須分析這樣的頁面:跳過<!DOCTYPE HTML>與htmlAgilityPack

<!DOCTYPE html> 
<html> 
<body> 
    <table cellspacing="0" cellpadding="0" border="0" style="border-style:none; padding:0; margin:0;" id="ctl00_ContentPlaceHolder1_ListView1_groupPlaceholderContainer">    
     <tbody> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH1" href="fumetto.aspx?Fumetto=279277">PH1_1</a> 
        </div> 
       </td> 
      </tr> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH2" href="fumetto.aspx?Fumetto=279277">PH2_1</a> 
        </div> 
       </td> 
      </tr> 
      <tr style="border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer">   
       <td style="border-style:none;padding:0; margin:0; width:22%;" id="ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3"> 
        <div class="photo"> 
         <a target="_self" title="PH3" href="fumetto.aspx?Fumetto=279277">PH3_1</a> 
        </div> 
       </td> 
      </tr> 
     </tbody> 
    </table> 
</body> 
</html> 

當我使用此代碼,我總是第一個節點( doctype一)在htmlDoc.DocumentNode裏面,我失去了html節點。有沒有辦法跳過doctype節點?

string filePath = "..."; 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionFixNestedTags = true; 
htmlDoc.LoadHtml(filePath); 

回答

0
String html = "<!DOCTYPE html><html><body><table cellspacing='0' cellpadding='0' border='0' style='border-style:none; padding:0; margin:0;' id='ctl00_ContentPlaceHolder1_ListView1_groupPlaceholderContainer'><tbody><tr style='border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer'>   <td style='border-style:none;padding:0; margin:0; width:22%;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3'><div class='photo'><a target='_self' title='PH1' href='fumetto.aspx?Fumetto=279277'>PH1_1</a></div></td></tr><tr style='border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer'><td style='border-style:none;padding:0; margin:0; width:22%;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3'><div class='photo'><a target='_self' title='PH2' href='fumetto.aspx?Fumetto=279277'>PH2_1</a></div></td></tr><tr style='border-style:none;padding:0; margin:0; background-image:none; vertical-align:top;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_itemPlaceholderContainer'><td style='border-style:none;padding:0; margin:0; width:22%;' id='ctl00_ContentPlaceHolder1_ListView1_ctrl0_ctl01_Td3'><div class='photo'><a target='_self' title='PH3' href='fumetto.aspx?Fumetto=279277'>PH3_1</a></div></td></tr></tbody></table></body></html>"; 
     HtmlDocument doc = new HtmlDocument(); 
     doc.LoadHtml(html); 
     HtmlNode htmlnode = doc.DocumentNode.Element("html"); 
     System.Diagnostics.Debug.WriteLine(htmlnode.OuterHtml); 

對我的作品和表演只能從HTML標籤內容。