如何在Visual Basic.NET中「HTML編碼」Em Dash

我正在生成一些要在網站上顯示的文本，並使用HttpUtility.HtmlEncode以確保它看起來正確。但是，這種方法似乎沒有編碼Em Dash（它應該將其轉換爲「—」）。如何在Visual Basic.NET中「HTML編碼」Em Dash

我已經想出了一個解決方案，但我相信有一個更好的方法來做到這一點 - 一些庫函數或什麼。

sWebsiteText = _ 
    "<![CDATA[" & _ 
    HttpUtility.HtmlEncode(sSomeText) & _ 
    "]]>" 

'This is the bit which seems "hacky"' 
sWebsiteText = _ 
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

所以我的問題是 - 你將如何實施「哈克」部分？

非常感謝，

RB。

來源

2009-01-08 RB.

Bobince的回答給出了一個解決方案，似乎是你最關心的問題：用更直接的char聲明替換你對HtmlDecode的使用。
重寫

sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

爲

sWebsiteText.Replace("\u2013", "&#x2013;")

（ '\ u2014'（DEC 8212）是破折號， '\ u2013'（DEC 8211）是短劃線。）
爲了提高可讀性目的可能被認爲更好使用「&＃x2013;」而不是「&＃8211;」，因爲char（「\ u2013」）的.Net聲明也是十六進制的。但是，由於html中的十進制符號似乎更常見，我個人比較喜歡使用「&＃8211;」。
爲了重用，您可能應該編寫自己的HtmlEncode函數，在自定義的HttpUtility中聲明，以便能夠從您站點中的任何其他位置調用它而不重複它。
（有這樣的事（對不起，我已經在C＃寫它，忘記你的例子是在VB）：

/// <summary> 
/// Supplies some custom processing to some HttpUtility functions. 
/// </summary> 
public static class CustomHttpUtility 
{ 
    /// <summary> 
    /// Html encodes a string. 
    /// </summary> 
    /// <param name="input">string to be encoded.</param> 
    /// <returns>A html encoded string.</returns> 
    public static string HtmlEncode(string input) 
    { 
     if (intput == null) 
      return null; 
     StringBuilder encodedString = new StringBuilder(
      HttpUtility.HtmlEncode(input)); 
     encodedString.Replace("\u2013", "&#x2013;"); 
     // add over missing replacements here, as for &#8212; 
     encodedString.Replace("\u2014", "&#x2014;"); 
     //... 

     return encodedString.ToString(); 
    } 
}

然後更換

sWebsiteText = _ 
    "<![CDATA[" & _ 
    HttpUtility.HtmlEncode(sSomeText) & _ 
    "]]>" 
'This is the bit which seems "hacky"' 
sWebsiteText = _ 
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

有了：

sWebsiteText = _ 
    "<![CDATA[" & _ 
    CustomHttpUtility.HtmlEncode(sSomeText) & _ 
    "]]>"

）

來源

2009-06-09 09:53:24

看看A List Apart，正如我在HTML Apostrophe中提出的問題。

em破折號—代表—。

來源

2009-01-08 10:56:55 mouviciel

我應該已經更清楚了 - 我的問題是找不到要編碼的內容，它會找到要編碼的內容。我會解決這個問題來說明問題。 – 2009-01-08 11:15:30

由於此字符不是ASCII字符，因此如何對其進行編碼？

這不是一個ASCII字符，但它是一個Unicode字符，U + 2014。如果你的頁面輸出將是UTF-8，它在今天和這個年代應該是，你不需要HTML編碼它，直接輸出字符。

是否有其他字符可能會給我的問題。

給你什麼問題？如果你不能輸出'—'，你可能不能輸出任何其他非ASCII的Unicode字符，這是成千上萬的字符。

將「\ u2014」替換爲「&＃x2014;」如果你真的必須，但真正用今天的Unicode感知工具，應該不需要用標記替換每個非ASCII Unicode字符。

來源

2009-01-08 11:29:12 bobince

我用我目前的解決方案更新了我的問題 - 我認爲這可能比我更好地解釋我的問題。 – 2009-01-08 11:44:39

如何在Visual Basic.NET中「HTML編碼」Em Dash

回答

相關問題