2010-01-24 53 views
0

我有一個數據庫,其中包含一些從MS Word粘貼的文本字段,並且我很難剝離標籤和標籤,但顯然保留了它們的innerText。使用Html Agility Pack剝離MS Word標籤

我使用HAP試過,但我沒有在正確的方向前進..

Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String 
    Dim htmlDoc As New HtmlDocument() 
    htmlDoc.LoadHtml(html) 
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span") 
    For Each node In invalidNodes 
     node.ParentNode.RemoveChild(node, False) 
    Next 
    Return htmlDoc.DocumentNode.WriteTo() 
End Function 

這個代碼只需選擇所需的元素並刪除它們......但不能保持自己內心的文字。 。

在此先感謝

回答

1

嗯......我想我找到了解決辦法:

Public Function StripHtml(ByVal html As String) As String 
    Dim htmlDoc As New HtmlDocument() 
    htmlDoc.LoadHtml(html) 
    Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span|//p") 
    For Each node In invalidNodes 
     node.ParentNode.RemoveChild(node, True) 
    Next 
    Return htmlDoc.DocumentNode.WriteContentTo 
End Function 

我幾乎在那裏...:P