拆分句子期

我使用python3，嘗試時間後附註號的文本拆分後的數字字符：拆分句子期

text = "Reproduction now becomes posited as 「natural」 production.16 Fortunati joins Marx in a minute but crucial declension from usevalue to nonvalue. "

這是最接近的句子拆分正則表達式我已經得到了目前仍然可以工作：

sentences = re.split(r' *[\.\?!][\'"\)\]]* +', text)

我基本上失去了w/r/t通過正則表達式在一段時間後立即捕獲數字實例。任何幫助將[0-9]正確合併到表達式中？謝謝。

編輯這是怎麼了理想的是分裂的：

sentences[0]= "Reproduction now becomes posited as 「natural」 production.16" 
sentences[1]= " Fortunati joins Marx in a minute but crucial declension from usevalue to nonvalue."

來源

2016-01-25 r_e_cur

你能否清楚地向我們展示這句話的期望輸出？ –

像這樣的東西可以工作：'\。\ d + \ b'，但不清楚它是否是你以後的樣子。 – npinti

指出，更新了一點信息 –

使用re.findall：

>>> import re 
>>> re.findall(r'.*?\.\d+|.+', text) 
['Reproduction now becomes posited as 「natural」 production.16', 
' Fortunati joins Marx in a minute but crucial declension ...']

如果你是沒事使用第三方模塊，你可以使用regex，允許非固定寬度的環視聲明，拆分爲空字符串：

>>> import regex 
>>> regex.split(r'(?<=\.\d+\b)', text, flags=regex.VERSION1) 
['Reproduction now becomes posited as 「natural」 production.16', 
' Fortunati joins Marx in a minute but crucial declension ...']

來源

2016-01-25 07:27:21 falsetru

回答

相關問題