如何用正則表達式替換空格（Unicode到UTF-8）C＃

我試圖在C＃中替換正則表達式。我試圖編寫的方法用UTF-8中的普通空間替換一些Unicode字符（空格）。如何用正則表達式替換空格（Unicode到UTF-8）C＃

讓我用代碼解釋。我不擅長寫正則表達式，文化信息和正則表達式。

//This method replace white spaces in unicode by whitespaces UTF-8 
    public static string cleanUnicodeSpaces(string value) 
    { 
     //This first pattern works but, remove other special characteres 
     //For example: mark accents 
     //string pattern = @"[^\u0000-\u007F]+"; 
     string cleaned = ""; 
     string pattern = @"[^\u0020\u0009\u000D]+"; //Unicode characters 
     string replacement = ""; //Replace by UTF-8 space 
     Regex regex = new Regex(pattern); 
     cleaned = regex.Replace(value, replacement).Trim(); //Trim by quit spaces 
     return cleaned; 
    }

的Unicode空間

HT：U + 0009 =字符製表
LF：U + 000A =換行
CR：U + 000D =回車

我做錯了什麼？

來源

統一Characteres：https://unicode-table.com/en
白色空間：https://en.wikipedia.org/wiki/Whitespace_character
正則表達式：https://msdn.microsoft.com/es-es/library/system.text.regularexpressions.regex(v=vs.110).aspx

SOLUTION 感謝@ Wiktor的-stribiżew和@ mathias- R-葉森，解決方案：

string pattern = @"[\u0020\u0009\u000D\u00A0]+"; 
//I include \u00A0 for replace &nbsp

來源

2017-09-04 Diego Fernando Barrios Olmos

刪除''^從字符類 –

_string更換= 「」; _我看不到的空間在這裏。 – Steve

您返回'價值'。這是你通過的相同的事情。確保你返回'清理'而不是。 –

你的正則表達式 - [^\u0020\u0009\u000D]+ - 是一個negated character class匹配任何字符1+比常規空間（\u0020）其他選項卡（\u0009）並回車（\u000D）。您實際上正在尋找一個積極的角色班級，它與您指定的三個字符中的一個（\x0A代表換行符，\x0D代表回車符，\x09代表選項卡）在正常空間問題（\x20）中匹配。

你可能只是使用

var res = Regex.Replace(s, @"[\x0A\x0D\x09]", " ");

見regex demo

來源

2017-09-04 21:47:44

你爲我編寫代碼，我在字符串模式中包含了\ u00A0：\ nbsp。 string pattern = @「[\ u0020 \ u0009 \ u000D \ u00A0] +」; 非常感謝！ –

@DiegoFernandoBarriosOlmos如果有效，請考慮接受。 –

如何用正則表達式替換空格（Unicode到UTF-8）C＃

回答

相關問題