如何提取html標籤屬性？

我想開發我的第一個RSS新聞聚合器。我可以輕鬆從RSSItem對象中提取鏈接，標題和發佈日期。但是，我很難從提要項目中提取圖像。不幸的是，由於我的聲譽很低，所以我無法上傳圖片，所以不是幫助我提取出<img>的src屬性值，而是告訴我如何獲取<a>標籤的href屬性值。高度appreaciated！如何提取html標籤屬性？

這裏的字符串

<div style="text-align: center;" 
    <a href="http://www.engadget.com/2011/07/10/element5s-mini-l-solarbag-brings-eco-friendly-energy-protectio/"></a> 
</div>

編輯：

也許整個標題是錯誤的。有沒有一種方法可以找到使用XPath的值？

來源

2011-07-10 Dragan

and where is your string？我想這應該是「這是字符串」部分 – grapkulec

這看起來不像RSS。你在哪裏得到它？ –

約翰，這只是一個隨機的HTML。我沒有足夠的聲譽來嵌入圖像和鏈接，所以:) – Dragan

使用HTMLAgilityPack作爲回答這個帖子：

How can I get values from Html Tags?

的更多信息：

HTML可能不能很好地形成，因此，我們需要另一種解析器（除.NET提供XML一個）這是更容錯的。這就是HTMLAgilityPack進來

入門：

創建一個新的控制檯應用程序
右鍵單擊引用/管理的NuGet包（安裝的NuGet如果你沒有它）。
添加HTML敏捷

工作的示例：

 using System; 
     using System.IO; 
     using System.Text; 
     using HtmlAgilityPack; 

     namespace ConsoleApplication4 
     { 
      class Program 
      { 
       private const string html = 
     @"<?xml version=""1.0"" encoding=""ISO-8859-1""?> 
     <div class='linkProduct' id='link' anattribute='abc'/> 
     <bookstore> 
     <book> 
      <title lang=""eng"">Harry Potter</title> 
      <price>29.99</price> 
     </book> 
     <book> 
      <title lang=""eng"">Learning XML</title> 
      <price>39.95</price> 
     </book> 
     </bookstore> 
     "; 

       static void Main(string[] args) 
       { 
        HtmlDocument doc = new HtmlDocument(); 
        byte[] byteArray = Encoding.ASCII.GetBytes(html); MemoryStream stream = new MemoryStream(byteArray); 
        var ts = new MemoryStream(byteArray); 
        doc.Load(ts); 
        var root = doc.DocumentNode; 
        var tag = root.SelectSingleNode("/div"); 
        var attrib = tag.Attributes["anattribute"]; 
        Console.WriteLine(attrib.Value); 
       } 
      } 
     }

進一步把它：

獲取擅長的XPath。這裏是一個開始的好地方。

http://www.w3schools.com/xpath/xpath_syntax.asp

來源

2011-07-10 15:29:15 sgtz

我玩弄了HtmlAgilityPack，現在我能夠提取圖像。謝謝你的提示，先生！ – Dragan

如何提取html標籤屬性？

回答

相關問題