我試圖做到這一點:如何使用正則表達式在單詞邊界處分割?
import re
sentence = "How are you?"
print(re.split(r'\b', sentence))
結果是
[u'How are you?']
我想是這樣[u'How', u'are', u'you', u'?']
。這怎麼能實現?
我試圖做到這一點:如何使用正則表達式在單詞邊界處分割?
import re
sentence = "How are you?"
print(re.split(r'\b', sentence))
結果是
[u'How are you?']
我想是這樣[u'How', u'are', u'you', u'?']
。這怎麼能實現?
不幸的是,Python無法通過空字符串拆分。
要解決此問題,您需要使用findall
而不是split
。其實\b
只是字的邊界。
它相當於(?<=\w)(?=\W)|(?<=\W)(?=\w)
。
這意味着,下面的代碼將工作:
import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
那麼,OP不需要空白符號。 –
由'\ b'分割也會產生空白,因爲'\ b'長度爲零。 –
我的意思是'\ w + | [^ \ w \ s] +'可能更合適。 –
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)
輸出:
['How', 'are', 'you', '?']
Regex的說明:
"[\w']+|[.,!?;]"
1st Alternative: [\w']+
[\w']+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
' the literal character '
2nd Alternative: [.,!?;]
[.,!?;] match a single character present in the list below
.,!?; a single character in the list .,!?; literally
[Python不能由空字符串分割](https://mail.python.org/pipermail/tutor/2003-August/024753的.html)。 –
此外,它應該返回'[u'How',u'',u'are',u'',u'you',u'?']' –
@KennyLau是的,正確的,但那不是那麼重要,我可以返回或忽略空白,因爲過濾它是微不足道的。 – oarfish