用正則表達式除去中文字符？

我有一個字符串，是一個用中文寫成的句子。用正則表達式除去中文字符？

它包含中文字符和其他填充字符，如空格，逗號，感嘆號等，全部使用UTF8編碼。

使用正則表達式和latin1字符串，我可以使用preg_replace和[a-zA-Z]來清除它並刪除填充。

如何在中文字符串中只保留中文「字母」字符，同時刪除所有填充項？

2012-01-24 David19801

據this document，這裏是中國文字的unicode的範圍：

表12-2。含漢漢字

Block        Range   Comment 
CJK Unified Ideographs    4E00–9FFF  Common 
CJK Unified Ideographs Extension A 3400–4DBF  Rare 
CJK Unified Ideographs Extension B 20000–2A6DF Rare, historic 
CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic 
CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use 
CJK Compatibility Ideographs   F900–FAFF  Duplicates, unifiable variants, corporate 
characters 
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants

塊你可以使用這樣的：

preg_replace('/[^\u4E00-\u9FFF]+/', '', $string);

或

preg_replace('/\P{Han}+/', '', $string);

其中\P是\p

否定看到here所有unicode scripts

來源

2012-01-24 15:35:03 Toto

我剛剛試過[4E00-9FFF]，它似乎沒有工作...... – David19801

@ David19801：在我看來，你並沒有正確使用它。看我的編輯。 – Toto

用正則表達式除去中文字符？

回答

相關問題