python正則表達式替換unicode

在第一個測試字符串中，我試圖用空格替換文本中間的Unicode右箭頭字符，但它似乎沒有工作。python正則表達式替換unicode

一般情況下，我想刪除所有單個字符或多個Unicode「無字」，但保持的話，如果他們是-Z0-9和Unicode或混合物只\ W

# -*- coding: utf-8 -*- 
import re 
str = 'hi… » Test' 
str = 're of… » Pr' 
str = 're of… » Pr | removepipeaswell' 
print str 
str = re.sub(r' [^a-z0-9]+ ', ' ', str , re.UNICODE|re.MULTILINE) 
# str = re.sub(r' [^\p{Alpha}] ', ' ', str, re.UNICODE) 
print str 
're of… Pr removepipeaswell' #expected output 

str_nbsp = 'afds » asf'

編輯：增加了另一個測試字符串，我不想刪除「...」（unicode點），我想刪除多個unicode（非字）字符。

編輯：使用本作品爲測試用例，（但不是完整的HTML ??? - 它似乎只替換匹配到上半年的字符串，然後忽略其餘部分。）

str = re.sub(r' [^a-z0-9]+ ', ' ', str , re.UNICODE|re.MULTILINE)

編輯：臥槽，它必須像不正確讀取參數列表一些愚蠢的事：http://bytes.com/topic/python/answers/689341-sub-does-not-replace-all-occurences

[誰剛剛刪除他們的反應 - 感謝你的幫助。]

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

最後的測試字符串「str_nbsp」與上面的正則表達式不匹配。其中一個空格字符實際上是一個非破壞性的空格字符。我使用了www.regexr.com，並在每個角色上盤踞，以解決這個問題。

來源

2014-04-17 Dave

只是讓你知道[Stack Overflow Regular Expressions FAQ]（http://stackoverflow.com/a/22944075/2736496）。 :) – aliteralmind

謝謝。我是一個perl中的正則表達式，但我是python的新手。仍然習慣於不同的語法。 – Dave

如果您還不知道，Debuggex.com是一個同時具有Python和PCRE的在線測試工具。 – aliteralmind

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

來源

2014-04-17 01:11:58 Dave

python正則表達式替換unicode

回答

相關問題