python正則表達式拆分字符串，並獲取所有字不工作

我想要split字符串使用regular expression與python並獲取所有匹配的文字。python正則表達式拆分字符串，並獲取所有字不工作

RE：\w+(\.?\w+)*

這需要捕捉[a-zA-Z0-9_]只喜歡的東西。

但是當我嘗試匹配和字符串得到所有的內容，它不會返回正確的結果。

代碼片段：

>>> import re 
>>> from pprint import pprint 
>>> pattern = r"\w+(\.?\w+)*" 
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same. 
... Oh wait, it also need to filter out the symbols like [email protected]#$%^&*()-+=[]{}.,;:'"`| \(`.`)/ 
... 
... I guess that's it.""" 
>>> pprint(re.findall(r"\w+(.?\w+)*", string)) 
[' etc', ' well', ' same', ' wait', ' like', ' it']

它只是將一些單詞，但實際上它應該返回所有的文字，數字和下劃線（S）作爲連接例子。

Python版本：的Python 3.6.2（默認情況下，2017年7月17日，16時44分45秒）

感謝。

來源

2017-09-02 Mubin

使用're.findall（r「\ w +（？：\。？\ w +）*」，string）'。如果您只需要ASCII，請傳遞're.A'標誌，以便'\ w'只匹配ASCII字母和數字。見[demo]（https://ideone.com/2sLrjV）。如果你只需要匹配字母，用'[^ \ W \ d_]'替換'\ w'。請注意，您在開始時寫的內容與您在代碼中使用的內容不同。 –

太好了，謝謝。我用'java'使用了相同的re（'\ w +（。？\ w +）*'），並且它工作正常，請指出差異，那將會很棒。 – Mubin

那麼，你必須避開這個點，並使用一個非捕獲組。你不需要外部的捕獲括號。 –

您需要使用非 -capturing組（見here爲什麼）和逃逸點（見here什麼字符應該在正則表達式進行轉義）：

>>> import re 
>>> from pprint import pprint 
>>> pattern = r"\w+(?:\.?\w+)*" 
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same. 
... Oh wait, it also need to filter out the symbols like [email protected]#$%^&*()-+=[]{}.,;:'"`| \(`.`)/ 
... 
... I guess that's it.""" 
>>> pprint(re.findall(pattern, string, re.A)) 
['this', 'is', 'some', 'test', 'string', 'and', 'there', 'are', 'some', 'digits', 'as', 'well', 'that', 'need', 'to', 'be', 'captured', 'as', 'well', 'like', '1234567890', 'and', '321', 'etc', 'But', 'it', 'should', 'also', 'select', '_', 'as', 'well', 'I', 'm', 'pretty', 'sure', 'that', 'that', 'RE', 'does', 'exactly', 'the', 'same', 'Oh', 'wait', 'it', 'also', 'need', 'to', 'filter', 'out', 'the', 'symbols', 'like', 'I', 'guess', 'that', 's', 'it']

而且，只匹配ASCII字母，數字和_您必須通過re.A標誌。

請參閱Python demo。

來源

2017-09-02 18:12:01

謝謝，你是真正的英雄。 – Mubin

python正則表達式拆分字符串，並獲取所有字不工作

回答

相關問題