2013-12-12 51 views
0

這是我的HTML代碼解析HTML表使用HTML敏捷性包 - 無ID對錶

<center> 
    <table cellspacing="0" cellpadding="0" border="0"> 
    <tbody><tr><td><img src="/someimages/images/dot_t.gif" hspace="20"></td> 
    <td><font face="Arial Rounded MT Bold, Arial, Helvetica" size="5" color="#000088"> 
    Marks Sheet Page</font></td> 
    </tr> 
    </tbody></table> 
    <table> 
    <tbody><tr> 
    <td> 
    <table border="0" cellpadding="0" cellspacing="0"> 
    <tbody><tr><td><img src="/someimages/images/dot_t.gif" hspace="15"></td><td align="CENTER"> 

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"> 

    <tbody><tr bgcolor="#FF6600"> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Name</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Account</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Postal&nbsp;Address</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Town</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Zip&nbsp;Code</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Weather&nbsp;Turn-Off</font></th> 
    <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Next&nbsp;Weather&nbsp;Sample&nbsp;Date</font></th> 

    <!-- IF NOT ValidateDB2Acct(request("AcctNo")) THEN   response.write "<font FACE='Arial Rounded MT Bold, Arial, Helvetica' SIZE='3' COLOR='#000088'></font>"  ELSE // --> 
    </tr>  
    <tr><td align="CENTER">Company Name</td><td align="CENTER">1212121212121212</td><td align="CENTER">Street Addr Ln&amp;P</td><td align="CENTER">NEW YORK NY</td><td align="CENTER">10075</td><td align="CENTER">N</td><td align="CENTER">12/19/2013</td></tr></tbody></table><br> 
    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Break Code</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Local Variable</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">PLPLPL</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">MOM CODE</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Exam %</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Exam Area</font></th></tr><tr><td align="CENTER">L</td><td align="CENTER">20.5</td><td align="CENTER">&nbsp; 21.5629</td><td align="CENTER">&nbsp; --</td><td align="CENTER">100</td><td align="CENTER">&nbsp; J</td></tr></tbody></table> 

    <br> 

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Route Number</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Airline Class</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Earlier Account Number</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">On Monthly Xfin</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">WHO Code</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Profile</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">ITIN</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Municipal</font></th></tr><tr><td align="CENTER">13</td><td align="CENTER">9</td><td align="CENTER">00000000000000</td><td align="CENTER">21</td><td align="CENTER">50</td><td align="CENTER">N</td><td align="CENTER">Fully Taxable</td><td align="CENTER">--</td></tr></tbody></table></td></tr><tr><td colspan="2"><img src="/someimages/images/dot_t.gif" vspace="10"></td></tr><tr><td></td><td align="CENTER"> 

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">From Date</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">To Date</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">Bytes</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">KBB</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">Bill Amt</font></th></tr><tr><td align="CENTER">10/18/2013</td><td align="CENTER">11/18/2013</td><td align="RIGHT">7160</td><td align="RIGHT">17.60</td><td align="RIGHT">$671.46</td></tr><tr><td align="CENTER">9/18/2013</td><td align="CENTER">10/18/2013</td><td align="RIGHT">6800</td><td align="RIGHT">15.60</td><td align="RIGHT">$654.78</td></tr><tr><td align="CENTER">8/19/2013</td><td align="CENTER">9/18/2013</td><td align="RIGHT">8120</td><td align="RIGHT">18.00</td><td align="RIGHT">$811.63</td></tr><tr><td align="CENTER">7/19/2013</td><td align="CENTER">8/19/2013</td><td align="RIGHT">8320</td><td align="RIGHT">19.60</td><td align="RIGHT">$856.76</td></tr><tr><td align="CENTER">6/19/2013</td><td align="CENTER">7/19/2013</td><td align="RIGHT">9480</td><td align="RIGHT">21.60</td><td align="RIGHT">$988.60</td></tr><tr><td align="CENTER">5/20/2013</td><td align="CENTER">6/19/2013</td><td align="RIGHT">7680</td><td align="RIGHT">20.40</td><td align="RIGHT">$854.82</td></tr><tr><td align="CENTER">4/19/2013</td><td align="CENTER">5/20/2013</td><td align="RIGHT">7040</td><td align="RIGHT">17.60</td><td align="RIGHT">$746.32</td></tr><tr><td align="CENTER">3/21/2013</td><td align="CENTER">4/19/2013</td><td align="RIGHT">6800</td><td align="RIGHT">18.00</td><td align="RIGHT">$688.43</td></tr><tr><td align="CENTER">1/18/2013</td><td align="CENTER">3/21/2013</td><td align="RIGHT">15360</td><td align="RIGHT">18.00</td><td align="RIGHT">$1,456.56</td></tr><tr><td align="CENTER">12/19/2012</td><td align="CENTER">1/18/2013</td><td align="RIGHT">7280</td><td align="RIGHT">16.40</td><td align="RIGHT">$718.47</td></tr><tr><td align="CENTER">11/16/2012</td><td align="CENTER">12/19/2012</td><td align="RIGHT">8040</td><td align="RIGHT">17.60</td><td align="RIGHT">$848.67</td></tr><tr><td align="CENTER">10/18/2012</td><td align="CENTER">11/16/2012</td><td align="RIGHT">6800</td><td align="RIGHT">16.80</td><td align="RIGHT">$681.44</td></tr><tr><td align="CENTER">9/18/2012</td><td align="CENTER">10/18/2012</td><td align="RIGHT">7120</td><td align="RIGHT">18.40</td><td align="RIGHT">$757.94</td></tr><tr><td align="CENTER">8/17/2012</td><td align="CENTER">9/18/2012</td><td align="RIGHT">9160</td><td align="RIGHT">20.40</td><td align="RIGHT">$1,000.89</td></tr><tr><td align="CENTER">7/19/2012</td><td align="CENTER">8/17/2012</td><td align="RIGHT">9040</td><td align="RIGHT">20.00</td><td align="RIGHT">$884.61</td></tr><tr><td align="CENTER">6/19/2012</td><td align="CENTER">7/19/2012</td><td align="RIGHT">9320</td><td align="RIGHT">18.80</td><td align="RIGHT">$928.98</td></tr><tr><td align="CENTER">5/18/2012</td><td align="CENTER">6/19/2012</td><td align="RIGHT">7520</td><td align="RIGHT">16.40</td><td align="RIGHT">$788.95</td></tr><tr><td align="CENTER">4/19/2012</td><td align="CENTER">5/18/2012</td><td align="RIGHT">6280</td><td align="RIGHT">14.80</td><td align="RIGHT">$665.93</td></tr><tr><td align="CENTER">3/21/2012</td><td align="CENTER">4/19/2012</td><td align="RIGHT">6240</td><td align="RIGHT">17.20</td><td align="RIGHT">$725.73</td></tr><tr><td align="CENTER">2/21/2012</td><td align="CENTER">3/21/2012</td><td align="RIGHT">6640</td><td align="RIGHT">16.80</td><td align="RIGHT">$1,213.52</td></tr><tr><td align="CENTER">1/20/2012</td><td align="CENTER">2/21/2012</td><td align="RIGHT">7640</td><td align="RIGHT">18.40</td><td align="RIGHT">$1,347.25</td></tr><tr><td align="CENTER">12/20/2011</td><td align="CENTER">1/20/2012</td><td align="RIGHT">7600</td><td align="RIGHT">16.00</td><td align="RIGHT">$1,353.32</td></tr><tr><td align="CENTER">11/17/2011</td><td align="CENTER">12/20/2011</td><td align="RIGHT">7880</td><td align="RIGHT">17.60</td><td align="RIGHT">$1,307.75</td></tr></tbody></table><br> 
    </td></tr></tbody></table> 
<!-- END PAGE CONTENT AREA --> 
<!-- ***** to here ***** --> 

<!--include this footer on every page--> 
<!-- BEGIN PAGE FOOTER AREA --> 

<table width="100%"> 
<tbody><tr><td width="40"><img src="/someimages/images/dot_t.gif" hspace="20" vspace="20"></td> 
<td valign="top" align="CENTER"><br><br> 
<font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088"> 
Contact Us at <a href="mailto:[email protected]">[email protected]</a></font></td></tr> 
</tbody></table> 
<!-- END PAGE FOOTER AREA --> 
</td></tr></tbody></table></center> 

它同桌重複

<table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"> 

與它<tbody>。那<tbody>有幾個<tr>標籤。首先<tr>是一個頭/標題和第二<tr>標籤內容/值

我想這些錶轉換成數據表

我怎樣寫一個XPath的代碼,這樣我可以分裂4個HTML表格(其中有實際內容)?

到目前爲止,我試過,但我得到的錯誤與此代碼

HtmlDocument myHtml = new HtmlDocument(); 
myHtml.LoadHtml(stringHTML); 

ParseAllTables(myHtml); 

private static DataTable[] ParseAllTables(HtmlDocument doc) 
{ 
    var result = new List<DataTable>(); 
    foreach (var table in doc.DocumentNode.Descendants("table")) 
    { 
     result.Add(ParseTable(table)); 
    } 
    return result.ToArray(); 
} 

private static DataTable ParseTable(HtmlNode table) 
{ 
    var result = new DataTable(); 

    var rows = table.Descendants("tr"); 

    var header = rows.Take(1).First(); 
    foreach (var column in header.Descendants("td")) 
    { 
     result.Columns.Add(new DataColumn(column.InnerText, typeof(string))); 
    } 

    foreach (var row in rows.Skip(1)) 
    { 
     var data = new List<string>(); 
     foreach (var column in row.Descendants("td")) 
     { 
      data.Add(column.InnerText); 
     } 
     result.Rows.Add(data.ToArray()); 
    } 
    return result; 
} 

我只在乎<TR>標籤是表裏面<table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600">

首先<TR>的數據表頭 從二<TR>是數據表數據

+0

你得到的錯誤是什麼? – Jacob

+0

有幾個,首先我得到了datatable列已經存在。此外,邏輯還在加載第一張也有標題「標記頁面」的表格,這不是必需的。我會重新運行該程序,並給你確切的錯誤。 – CoolArchTek

+0

有一些改進,我使用這個XPath獲取表SelectNodes(「// table [@bordercolordark ='#993300'and @bordercolorlight ='#FF6600']」); 由於HTML具有多次相同的表格,我如何獲得上述表格的第一個實例,並且只有TH和TD在其中? – CoolArchTek

回答

0

這工作,

//table[@cellpadding='0' and @cellspacing='0']/tr[1]