2016-01-25 72 views
1

我使用python3,嘗試時間後附註號的文本拆分後的數字字符:拆分句子期

text = "Reproduction now becomes posited as 「natural」 production.16 Fortunati joins Marx in a minute but crucial declension from usevalue to nonvalue. " 

這是最接近的句子拆分正則表達式我已經得到了目前仍然可以工作:

sentences = re.split(r' *[\.\?!][\'"\)\]]* +', text) 

我基本上失去了w/r/t通過正則表達式在一段時間後立即捕獲數字實例。任何幫助將[0-9]正確合併到表達式中?謝謝。

編輯這是怎麼了理想的是分裂的:

sentences[0]= "Reproduction now becomes posited as 「natural」 production.16" 
sentences[1]= " Fortunati joins Marx in a minute but crucial declension from usevalue to nonvalue." 
+0

你能否清楚地向我們展示這句話的期望輸出? –

+0

像這樣的東西可以工作:'\。\ d + \ b',但不清楚它是否是你以後的樣子。 – npinti

+0

指出,更新了一點信息 –

回答

0

使用re.findall

>>> import re 
>>> re.findall(r'.*?\.\d+|.+', text) 
['Reproduction now becomes posited as 「natural」 production.16', 
' Fortunati joins Marx in a minute but crucial declension ...'] 

如果你是沒事使用第三方模塊,你可以使用regex,允許非固定寬度的環視聲明,拆分爲空字符串:

>>> import regex 
>>> regex.split(r'(?<=\.\d+\b)', text, flags=regex.VERSION1) 
['Reproduction now becomes posited as 「natural」 production.16', 
' Fortunati joins Marx in a minute but crucial declension ...']