2015-07-12 164 views
1

我想解析HTML,我不知道如何使用條件(例如類名必須是X)。我知道很多關於敏捷包的主題,但我找不到任何有用的東西。HtmlAgilityPack解析屬性

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
</div> 

<p> bla bla </p> 

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
</div> 

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
<p> asd sadh awww </p> 
</div> 

我想HREF,SRC和alt爲每個類名「主級」的div, 這是我的代碼,但它僅打印「P」,因爲這是我唯一知道如何做。

 HtmlDocument doc = new HtmlDocument(); 
     doc.LoadHtml(dataString); 
     foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("p").ToArray()) 
      { 
       Debug.WriteLine(nodeItem.InnerText); 
      } 

我工作的WP應用,在那裏「的SelectNodes」不支持

回答

0

通過使用傳統的非XPath的方式。

注:檢查省略爲空的值。

string dataString = "<div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a></div><p> bla bla </p><div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a></div><div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a><p> asd sadh awww </p></div>"; 

var doc = new HtmlDocument(); 
doc.LoadHtml(dataString); 

var elements = doc.DocumentNode.Descendants("div").Where(o => o.GetAttributeValue("class", "") == "main-class"); 
foreach (var nodeItem in elements) 
{ 
    var aTag = nodeItem.Descendants("a").First(); 
    var aTagHrefValue = aTag.Attributes["href"]; 

    var imgTag = nodeItem.Descendants("img").First(); 
    var imgTagSrcValue = imgTag.Attributes["src"]; 
    var imgTagAltValue = imgTag.Attributes["alt"]; 

    Console.WriteLine("a href value: {0}", aTagHrefValue.Value); 
    Console.WriteLine("img src value: {0}", imgTagSrcValue.Value); 
    Console.WriteLine("img alt value: {0}", imgTagAltValue.Value); 
    Console.WriteLine(); 
} 
0

@Orel Eraki - 謝謝。我在3分鐘前自己做了,不過我會用你的解決方案,因爲它只有一個foreach循環。反正這裏是我的解決方案

 foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("div").Where(p => p.GetAttributeValue("class", "def").Equals("main-class"))) 
     { 
      foreach (HtmlNode nodeAItem in nodeItem.Descendants("a")) 
      { 
       Debug.WriteLine(nodeAItem.GetAttributeValue("href", "def")); 
       foreach (HtmlNode nodeIMAGEitem in nodeAItem.Descendants("img")) 
       { 
        Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("src", "def")); 
        Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("alt", "def")); 
       }      
      } 
      } 
0

您可以使用LINQ爲

var attrs = doc.DocumentNode 
       .Descendants("div") 
       .Where(d => d.Attributes != null && 
          d.Attributes.Contains("class") && 
          d.Attributes["class"].Value.Contains("main-class")) 
       .Select(d => new 
       { 
        anchor = d.SelectSingleNode("a"), 
        img = d.SelectSingleNode("a") != null 
               ? d.SelectSingleNode("a").SelectSingleNode("img") 
               : null 
       }) 
       .Select(d => new 
       { 
        href = d.anchor != null 
            ? d.anchor.GetAttributeValue("href", string.Empty) 
            : string.Empty, 
        imgsrc = d.img != null 
            ? d.img.GetAttributeValue("src", string.Empty) 
            : string.Empty, 
        imgalt = d.img != null 
            ? d.img.GetAttributeValue("alt", string.Empty) 
            : string.Empty 
       }) 
       .ToList();