2012-05-09 283 views
5

好的,所以我需要查詢一個實時網站從表中獲取數據,將這個HTML表格放入一個DataTable中,然後使用這些數據。到目前爲止,我已經設法使用Html Agility Pack和XPath來訪問我需要的表中的每一行,但我知道必須有一種方法將它解析爲DataTable。 (C#)我目前使用的代碼是:從HTML表格獲取數據到數據表

string htmlCode = ""; 
using (WebClient client = new WebClient()) 
{ 
htmlCode = client.DownloadString("http://www.website.com"); 
} 
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 

doc.LoadHtml(htmlCode); 

//My attempt at LINQ to solve the issue (not sure where to go from here) 
var myTable = doc.DocumentNode 
.Descendants("table") 
.Where(t =>t.Attributes["summary"].Value == "Table One") 
.FirstOrDefault(); 

//Finds all the odd rows (which are the ones I actually need but would prefer a 
//DataTable containing all the rows! 
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td")) 
{ 
string test = cell.InnerText; 
//Have not gone further than this yet! 
} 

網站上的HTML表格,我查詢看起來像這樣:

<table summary="Table One"> 
<tbody> 
<tr class="odd"> 
<td>Some Text</td> 
<td>Some Value</td> 
</tr> 
<tr class="even"> 
<td>Some Text1</td> 
<td>Some Value1</td> 
</tr> 
<tr class="odd"> 
<td>Some Text2</td> 
<td>Some Value2</td> 
</tr> 
<tr class="even"> 
<td>Some Text3</td> 
<td>Some Value3</td> 
</tr> 
<tr class="odd"> 
<td>Some Text4</td> 
<td>Some Value4</td> 
</tr> 
</tbody> 
</table> 

我不知道它是否是更好/更容易使用LINQ + HAP或XPath + HAP來獲得所需的結果,我嘗試以有限的成功嘗試,你可能會看到。這是我第一次製作一個程序來查詢一個網站,甚至以任何方式與一個網站進行交互,所以我目前很不確定!感謝您提前提供任何幫助:)

+0

對此有幫助嗎? http://weblogs.asp.net/grantbarrington/archive/2009/10/15/screen-scraping-in-c.aspx – iwayneo

回答

4

HTML Agility Pack沒有這種方法,但創建一個不應該太難。有samples out there可以將XML從Linq-to-XML轉換爲Datatable。這些可以重新成爲你需要的東西。

如果需要我可以幫助創建整個方法,但不是今天:)。

參見:

+0

謝謝你看了這些資源和我已經設法出現的其他幾個人以一種方式來做到這一點:D –

+0

你願意分享你的解決方案,爲他人着想嗎? – jessehouwing

+0

感謝下面的提示添加解決方案! –

3

這是我的解決方案。可能有點混亂,但它是做什麼工作的完美:d

string htmlCode = ""; 
using (WebClient client = new WebClient()) 
{ 
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError"); 
htmlCode = client.DownloadString("http://www.website.com"); 
} 
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 

doc.LoadHtml(htmlCode); 

DataTable dt = new DataTable(); 
dt.Columns.Add("Name", typeof(string)); 
dt.Columns.Add("Value", typeof(decimal)); 

int count = 0; 
decimal rowValue = 0; 
bool isDecimal = false; 
foreach (var row in doc.DocumentNode.SelectNodes("//table[@summary='Table Name']/tbody/tr")) 
{ 
DataRow dr = dt.NewRow(); 
foreach (var cell in row.SelectNodes("td")) 
{ 
if ((count % 2 == 0)) 
{ 
dr["Name"] = cell.InnerText.Replace("&nbsp;", " "); 
} 
else 
{ 
isDecimal = decimal.TryParse((cell.InnerText.Replace(".", "")).Replace(",", "."), out rowValue); 
if (isDecimal) 
{ 
dr["Value"] = rowValue; 
} 
dt.Rows.Add(dr); 
} 
count++; 
} 
} 
8

使用上述一些傑克艾克的代碼和馬克Gravell一些代碼(see post here),我設法拿出一個解決方案。 這段代碼被用於獲得2012南非今年公衆假期爲寫這篇文章

using System; 
using System.Collections.Generic; 
using System.ComponentModel; 
using System.Data; 
using System.Drawing; 
using System.Linq; 
using System.Text; 
using System.Windows.Forms; 
using System.Web; 
using System.Net; 
using HtmlAgilityPack; 



namespace WindowsFormsApplication 
{ 
    public partial class Form1 : Form 
    { 
     private DataTable dt; 
     public Form1() 
     { 
      InitializeComponent(); 
     } 

     private void button1_Click(object sender, EventArgs e) 
     { 

      string htmlCode = ""; 
      using (WebClient client = new WebClient()) 
      { 
       client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError"); 
       htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm"); 
      } 
      HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 

      doc.LoadHtml(htmlCode); 

      dt = new DataTable(); 
      dt.Columns.Add("Name", typeof(string)); 
      dt.Columns.Add("Value", typeof(string)); 

      int count = 0; 


      foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) 
      { 

       foreach (HtmlNode row in table.SelectNodes("tr")) 
       { 

        if (table.Id == "table2") 
        { 
         DataRow dr = dt.NewRow(); 

         foreach (var cell in row.SelectNodes("td")) 
         { 
          if ((count % 2 == 0)) 
          { 
           dr["Name"] = cell.InnerText.Replace("&nbsp;", " "); 
          } 
          else 
          { 

           dr["Value"] = cell.InnerText.Replace("&nbsp;", " "); 

           dt.Rows.Add(dr); 
          } 
          count++; 

         } 


        } 

       } 


       dataGridView1.DataSource = dt; 

      } 
     } 

    } 
} 
1

簡單的邏輯來一個HTMLTABLE轉換爲DataTable的:

//Define your webtable 
public static HtmlTable table 
      { 
       get 
       { 
        HtmlTable var = new HtmlTable(parent); 
        var.SearchProperties.Add("id", "searchId"); 
        return var; 
       } 
      } 

//Convert a webtable to datatable 
public static DataTable getTable 
      { 
       get 
       { 
        DataTable dtTable= new DataTable("TableName"); 
        UITestControlCollection rows = table.Rows; 
        UITestControlCollection headers = rows[0].GetChildren(); 
        foreach (HtmlHeaderCell header in headers) 
        { 
         if (header.InnerText != null) 
          dtTable.Columns.Add(header.InnerText); 
        } 
        for (int i = 1; i < rows.Count; i++) 
        { 
         UITestControlCollection cells = rows[i].GetChildren(); 
         string[] data = new string[cells.Count]; 
         int counter = 0; 
         foreach (HtmlCell cell in cells) 
         { 
          if (cell.InnerText != null) 
           data[counter] = cell.InnerText; 
          counter++; 
         } 
         dtTable.Rows.Add(data); 
        } 
        return dtTable; 
       } 
      } 
0

你可以試試

DataTable.Rows[i].Cells[j].InnerText; 

其中DataTable是你的表的id,i是行,j是單元格。