2012-11-16 116 views
0

我試圖從HTML內容中刪除跨度,字體,b,s,罷工(和其他內部元素)標籤,同時保留其中的文本和<br>標籤。爲此,我使用HTML敏捷包。我設法保留文字,但<br>標籤仍然是一個問題。有任何想法嗎?刪除Html標籤,同時保留內部文本和<br>標籤

下面是代碼:

private void removeTagsButPreserveText2(HtmlNode nodeToRemove) 
    { 
     var parent = nodeToRemove.ParentNode; 
     var prev = nodeToRemove.PreviousSibling; 

     if (prev != null) 
     { 
      var child = nodeToRemove.SelectNodes("./br"); 

      if (child == null) 
      { 
       parent.InsertAfter(documentToSearch.CreateTextNode(nodeToRemove.InnerText + " "), prev); 

       nodeToRemove.Remove(); 
      } 
      else 
      { 
       foreach (var item in child) 
       { 
        var parent2 = item.PreviousSibling; 

        if (parent2 != null) 
        { 
         if (parent2.InnerText.HasDate()) 
         { 
          var newNode = parent.InsertAfter(documentToSearch.CreateTextNode(parent2.InnerText), prev); 
          parent.InsertAfter(documentToSearch.CreateElement("br"), newNode); 
          nodeToRemove.Remove(); 
         } 
        } 
       } 
      } 
     } 
    } 

例如,輸入將是:

<p><font face="Arial" size="2"><strike> 
     <span style="font-weight: 400"><font color="#000000">Paper 
     Submission (Full 
     Paper) Before 
     <span lang="en-us">September</span> 20, 201<span lang="en-us">2</span></font></span></strike><font color="#FF0000"><br> 
     Notification of 
     Acceptance On <span lang="en-us">October 5</span>, 201<span lang="en-us">2</span><br> 
     Authors' 
     Registration Before 
     <span lang="en-us">October 20</span>, 201<span lang="en-us">2</span><br> 
     ICNIT 2012 Conference 
     Dates November 
     17 - 18, 2012</font></font></p> 

和輸出應該是這樣的:

<p>Paper Submission (Full Paper) Before September 20, 2012<br> 
     Notification of Acceptance On October 5, 2012<br> 
     Authors' Registration Before October 20, 2012<br> 
     ICNIT 2012 Conference 
     Dates November 
     17 - 18, 2012</p> 
+2

無代碼=不建議... –

+0

出於好奇,是什麼這樣做背後的目標是什麼? – MikeSmithDev

+0

您可以在'清理'之前將\
替換爲\ t(讓jsut說),然後將其轉換回
? –

回答

0

你嘗試使用正則表達式爲什麼?我的意思是,把所有東西都變成「<xxxx asdasd>」或類似的東西,並替換爲「」只維護<BR>

+2

這是一條評論,而不是答案 – MikeSmithDev

+0

我解析了很多HTML。正則表達式會讓它更慢。 –

+0

Medeiros,歡迎來到計算器。看到這個經典的參考http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags關於解析HTML與正則表達式 –

相關問題