Strip Word Html Tags

我需要去除特定位置的Word HTML標記。目前，我這樣做：現在我剝整個HTML爲<p>標籤與sc.Add(@"<p> </p>");Strip Word Html Tags

public string CleanWordStyle(string html) 
{ 
    StringCollection sc = new StringCollection(); 
    sc.Add(@"<table\b[^>]*>(.*?)</table>"); 
    sc.Add(@"(<o:|</o:)[^>]+>"); 
    sc.Add(@"(<v:|</v:)[^>]+>"); 
    sc.Add(@"(<st1:|</st1:)[^>]+>"); 
    sc.Add(@"(mso-bidi-|mso-fareast|mso-spacerun:|mso-list: ign|mso-ascii|mso-hansi|mso-ansi|mso-element|mso-special|mso-highlight|mso-border|mso-yfti|mso-padding|mso-background|mso-tab|mso-width|mso-height|mso-pagination|mso-theme|mso-outline)[^;]+;"); 
    sc.Add(@"(font-size|font-family):[^;]+;"); 
    sc.Add(@"font:[^;]+;"); 
    sc.Add(@"line-height:[^;]+;"); 
    sc.Add(@"class=""mso[^""]+"""); 
    sc.Add(@"times new roman&quot;,&quot;serif&quot;;"); 
    sc.Add(@"verdana&quot;,&quot;sans-serif&quot;;"); 
    sc.Add(@"<p> </p>"); 
    sc.Add(@"<p>&nbsp;</p>"); 
    foreach (string s in sc) 
    { 
     html = Regex.Replace(html, s, "", RegexOptions.IgnoreCase); 
    } 
    html = Regex.Replace(html, @"&nbsp;", @"&#160;"); //can not be read by as XmlDocument if not! 
    return html; 
}

，但我要的是：如果我打表的標籤，應立即停止更換，直到達到一個表結束標籤。可能嗎？

來源

2012-07-06 Timsen

我給出一個解決方案，但現在，我再想一想，是刪除和格式化這個詞，只是不停的文字...我不知道，如果是你的樣子因爲，但HTMLAgilityPack的使用是這個想法。 – Aristos 2012-07-06 08:41:24

我的定製者希望不要觸摸桌子標籤內的所有東西，但其他所有東西都應該剝離。它不是我正在尋找的解決方案 – Timsen 2012-07-06 08:44:46

看看HTMLAgilityPack，這是個想法，這可以給你DOM，並從那裏你可以保留你想要的部分。 – Aristos 2012-07-06 08:45:33

正則表達式可以用於一行或非常簡單的html結構。

如果您確實贏得了使用最少代碼的工作，請從http://htmlagilitypack.codeplex.com/獲取HTMLAgilityPack，並從所有標記的內部值中獲取所有文本。

這將是簡單的：

public string CleanWordStyle(string htmlPage) 
{ 
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
    doc.LoadHtml(htmlPage); 

    return doc.DocumentNode.InnerText; 
}

來源

2012-07-06 08:35:55 Aristos

除了遍歷所有子節點並添加到字符串構建之外，您可以返回根節點的innerttext。 – jnoreiga 2012-09-18 19:00:11

@jnoreiga謝謝你的糾正。 – Aristos 2012-09-18 20:32:04

沒問題。這不會僅僅去除單詞樣式。它將去除根目錄中的所有html。 – jnoreiga 2012-09-19 17:42:05

Strip Word Html Tags

回答

相關問題