2017-02-05 23 views
1
string = "Special $#! characters spaces 888323 Kek ཌི ༜ 郭 ༜ དྀ " 

的結果應該是: 「Specialcharactersspaces888323Kek郭」Python 2.7版刪除特殊CHAC,間隙,但不是漢字

我有
print ''.join(c for c in string.decode('utf-8') if u'\u4e00' <= c <= u'\u9fff')

但錯誤返回
嘗試 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u90ed' in position 4 9: ordinal not in range(128)

我的問題是一樣的標題,
刪除特殊CHAC,間隙,但不是漢字

回答

1

該解決方案使用re.compilere.sub功能:

import re 

string = "Special $#! characters spaces 888323 Kek ཌི ༜ 郭 ༜ དྀ " 

# defining the pattern which should match all characters excepting alphanumeric and chinese 
pattern = re.compile(u'[^a-z0-9⺀-⺙⺛-⻳⼀-⿕々〇〡-〩〸-〺〻㐀-䶵一-鿃豈-鶴侮-頻並-龎]', re.UNICODE | re.IGNORECASE) 
result = pattern.sub('', string) 

# print(result) Python v.3 printing 
print result 

輸出:

Specialcharactersspaces888323Kek郭 
+0

如何關於我不想刪除像'$#!?<>'@RomanPerekhrest –

+0

這樣的異常特殊問題!@#$%^&*():「<> /」。 –

+0

@ChinYe,根據你的「例外列表」顯示預期結果 – RomanPerekhrest