如何從數組中的HTML字符串獲取內容

我正在處理一些html內容。 HTML的格式如下所示。如何從數組中的HTML字符串獲取內容

<li> 
    <ul> 
    <li>Test1</li> 
    <li>Test2</li> 
    </ul> 
    Odd string 1 
    <ul> 
    <li>Test3</li> 
    <li>Test4</li> 
    </ul> 
    Odd string 2 
    <ul> 
    <li>Test5</li> 
    <li>Test6</li> 
    </ul> 
<li>

在html內容中可以有多個「奇怪的字符串」。所以我想要數組中的所有「奇怪的字符串」。有沒有簡單的方法？（我使用C＃和HtmlAgilityPack）

來源

2013-07-05 Debajit Mukhopadhyay

他們將永遠是和

Jonesopolis

@Jonesy是的，他們將永遠和

選擇ul元素是指一個同級節點，這將是你的文字：

HtmlDocument html = new HtmlDocument(); 
html.Load(html_file); 
var odds = from ul in html.DocumentNode.Descendants("ul") 
      let sibling = ul.NextSibling 
      where sibling != null && 
       sibling.NodeType == HtmlNodeType.Text && // check if text node 
       !String.IsNullOrWhiteSpace(sibling.InnerHtml) 
      select sibling.InnerHtml.Trim();

來源

2013-07-05 12:17:48

的作品就像一個魅力.. –

像

MatchCollection matches = Regex.Matches(HTMLString, "</ul>.*?<ul>", RegexOptions.SingleLine); 
foreach (Match match in matches) 
{ 
    String oddstring = match.ToString().Replace("</ul>","").Replace("<ul>",""); 
}

來源

2013-07-05 12:14:04 Jonesopolis

之間OP可能需要使用HtmlAgilityPack（注意標籤和問題的最後一句） –

哦，良好的通話解決方案我錯過了 – Jonesopolis

獲取所有的ul後裔和檢查下一個兄弟節點是否爲HtmlNodeType.Text，如果不是空的：

List<string>oddStrings = new List<string>(); 
HtmlDocument doc = new HtmlDocument(); 
doc.LoadHtml(html); 
foreach (HtmlNode ul in doc.DocumentNode.Descendants("ul")) 
{ 
    HtmlNode nextSibling = ul.NextSibling; 
    if (nextSibling != null && nextSibling.NodeType == HtmlNodeType.Text) 
    { 
     string trimmedText = nextSibling.InnerText.Trim(); 
     if (!String.IsNullOrEmpty(trimmedText)) 
     { 
      oddStrings.Add(trimmedText); 
     } 
    } 
}

來源

2013-07-05 12:26:29

敏捷性包已經可以查詢這些文本

var nodes = doc.DocumentNode.SelectNodes("/html[1]/body[1]/li[1]/text()")

來源

2013-07-05 12:31:44 rajeemcariazo

使用該XPATH：

//body/li[1]/text()

來源

2013-07-05 12:40:45 JunoPatch

如何從數組中的HTML字符串獲取內容

回答

相關問題