2016-11-11 85 views
0

我想保持不超過2 <br>每個段落如何刪除超過一定數量的重複br標籤?

string html = @"paragraph 1 a dkahdk ahkdhadk.<br><br><br> 
<br> 
paragraph 2 adshkad hkasdhkasdh.<br> 
<br> 
paragraph 3 akdash dkjahiewry iwery.<br> 
<br><br> 
paragraph 4 ljsdlfjsldfj.<br> 
<br> 
<br> 
<br>";  

HtmlAgilityPack.HtmlDocument doc = new HtmlDocument(); 

doc.LoadHtml(html); 
var xpath = "//text()[not(normalize-space())]"; 
var emptyNodes = doc.DocumentNode.SelectNodes(xpath); 
foreach (HtmlNode emptyNode in emptyNodes) 
{ 
    emptyNode.Remove(); // remove \r\n 
} 
var nodes = doc.DocumentNode.SelectNodes("//br[following-sibling::br[3]]").ToList(); 
foreach (var node in nodes) 
{ 
    node.Remove(); 
} 

輸出是不知何故,刪除所有br。正確的輸出應該是

paragraph 1 a dkahdk ahkdhadk.<br><br> 
paragraph 2 adshkad hkasdhkasdh.<br><br> 
paragraph 3 akdash dkjahiewry iwery.<br><br> 
paragraph 4 ljsdlfjsldfj.<br><br> 

回答

0

一個簡單的正則表達式替換就足夠了,而不是使用HtmlAgilityPack。例如,使用多步驟工藝:

//use regex to find <br>, <br > or <br /> tags: 
//var toNewLines = new Regex(@"<br\s?/?>"); 
//var onlyNewLines = toNewLines.Replace(html, Environment.NewLine); 
//or, since all br tags are <br>: 
var onlyNewLines = html.Replace("<br>", Environment.NewLine); 

var regex = new Regex(@"([" + Environment.NewLine + "\t])+"); 

var result = regex.Replace(onlyNewLines, Environment.NewLine); 

var finalResult = result.Replace(Environment.NewLine, "<br /><br />" + Environment.NewLine);