我發現Python和Javascript類似的問題和答案，但不適用於C＃或任何其他WinRT兼容語言。將HTML實體轉換爲C＃中的Unicode字符

我認爲我需要它的原因是因爲我正在顯示從Windows 8商店應用程序中的網站獲取的文本。例如。 é應該變成é。

或者還有更好的方法嗎？我沒有顯示網站或rss提要，但只是一個網站及其標題的列表。

來源

2012-11-21 Remy

重複： http://stackoverflow.com/questions/5783817/convert-character-entities-to-their-unicode-equivalents –

其實事實並非如此。他有一個不同的問題。 – Remy

我建議使用System.Net.WebUtility.HtmlDecode和不HttpUtility.HtmlDecode。

這是由於在Winforms/WPF/Console應用程序中不存在System.Web引用的事實，並且您可以使用此類（已在所有這些項目中添加爲參考）獲得完全相同的結果。

使用方法：在Metro應用和WP8應用HTML實體和HTML數

string s = System.Net.WebUtility.HtmlDecode("&eacute;"); // Returns é

來源

2012-11-21 11:57:55 Blachshma

愚蠢的我，我想那只是最簡單的使用方法實體... – Remy

「你可以得到完全相同的結果，使用這個類」 - 錯誤只有HttpUtility實現將正確解碼爲作爲WP8上的一個撇號 –

在我的情況下，'HttpUtility.HtmlDecoded'做正確的事情。 –

使用HttpUtility.HtmlDecode() .Read MSDN上here

decodedString = HttpUtility.HtmlDecode(myEncodedString)

來源

2012-11-21 11:43:59

是的，請注意，對於WinForms或控制檯應用程序，您首先必須添加對System.Web程序集的引用。 –

嗨，我試過這個解決方案，但它不能解碼像'＆lbrace;':( –

@ l19這樣的字符是一個公認的htmlentity？我找不到它在這個[list]（http：//en.wikipedia。然而，我確實設法在W3C規範中找到它，這可能就是爲什麼它還沒有解碼的原因。 – crush

不同的編碼/編碼。

隨着Windows運行時Metro應用

{ 
    string inStr = "ó"; 
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr); 
    // auxStr == &#243; 
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr); 
    // outStr == ó 
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;"); 
    // outStr2 == ó 
}

隨着Windows Phone 8.0

{ 
    string inStr = "ó"; 
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr); 
    // auxStr == &#243; 
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr); 
    // outStr == &#243; 
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;"); 
    // outStr2 == ó 
}

爲了解決這個問題，在WP8，我已經打電話System.Net.WebUtility.HtmlDecode()之前實施的表HTML ISO-8859-1 Reference。

來源

2013-02-05 09:15:45 user1954682

這可能是有用的，用它們的unicode等價物替換所有（根據我的要求去）實體。

public string EntityToUnicode(string html) { 
     var replacements = new Dictionary<string, string>(); 
     var regex = new Regex("(&[a-z]{2,5};)"); 
     foreach (Match match in regex.Matches(html)) { 
      if (!replacements.ContainsKey(match.Value)) { 
       var unicode = HttpUtility.HtmlDecode(match.Value); 
       if (unicode.Length == 1) { 
        replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";")); 
       } 
      } 
     } 
     foreach (var replacement in replacements) { 
      html = html.Replace(replacement.Key, replacement.Value); 
     } 
     return html; 
    }

來源

2014-07-01 16:34:45 zumey

爲我工作情況，但我編輯了正則表達式「var regex = new Regex（」（＆[az] {2,6};）「）;」有很多HTML字符超過5（如$ eacute;） – forumma

我也建議將正則表達式更改爲'var regex = new Regex（「（＆[a-zA-Z] {2,7} ;）「）;'以便包括'＆Atilde;'這樣的字符。 – chrisofspades

這對我有用，取代了常用和unicode實體。

private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);"); 

public static string HtmlDecode(this string html) 
{ 
    if (html.IsNullOrEmpty()) return html; 
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#" 
     ? ((char)int.Parse(x.Groups[2].Value)).ToString() 
     : HttpUtility.HtmlDecode(x.Groups[0].Value)); 
} 

[Test] 
[TestCase(null, null)] 
[TestCase("", "")] 
[TestCase("&#39;fark&#39;", "'fark'")] 
[TestCase("&quot;fark&quot;", "\"fark\"")] 
public void should_remove_html_entities(string html, string expected) 
{ 
    html.HtmlDecode().ShouldEqual(expected); 
}

來源

2016-09-29 18:53:02 hcoverlambda

將HTML實體轉換爲C＃中的Unicode字符

回答

隨着Windows運行時Metro應用

隨着Windows Phone 8.0

相關問題