2011-05-03 64 views
3

我想從HTML源文件中提取文本。我想用C#和htmlagilitypack DLL。如何使用htmlagilitypack爲此示例從HTML中提取文本?

來源是:

<table> 
    <tr> 
    <td class="title"> 
     <a onclick="func1">Here 2</a> 
    </td> 
    <td class="arrow"> 
     <img src="src1" width="9" height="8" alt="Down"> 
    </td> 
    <td class="percent"> 
     <span>39%</span> 
    </td> 
    <td class="title"> 
     <a onclick="func2">Here 1</a> 
    </td> 
    <td class="arrow"> 
     <img src="func3" width="9" height="8" alt="Up"> 
    </td> 
    <td class="percent"> 
     <span>263%</span> 
    </td> 
    </tr> 
</table> 

我怎樣才能獲得文本這裏1和2在這裏從表?

回答

7
HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.LoadHtml("web page string"); 
var xyz = from x in htmlDoc.DocumentNode.DescendantNodes() 
        where x.Name == "td" && x.Attributes.Contains("class") 
        where x.Attributes["class"].Value == "title" 
        select x.InnerText; 

不那麼漂亮,但應該工作

3

Xpath的版本

HtmlDocument doc = new HtmlDocument(); 
doc.LoadHtml(t); 

//this simply works because InnerText is iterative for all child nodes 
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//td[@class='title']"); 
//but to be more accurate you can use the next line instead 
//HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//td[@class='title']/a"); 


string result; 
foreach (HtmlNode item in nodes) 
     result += item.InnerText; 

和LINQ的版本只是改變了var節點= ..符合:

var Nodes = from x in htmlDoc.DocumentNode.DescendantNodes() 
        where x.Name == "td" && x.Attributes["class"].Value == "title" 
        select x.InnerText; 
+0

怎麼辦你顯示單元格文字? – 2012-08-21 20:17:15

+0

使用innerText或者你可以像這樣在xpath中使用text()「// td [@ class ='title']/a/text()」 – 2012-08-22 14:27:17

相關問題