0
我覺得我跟這個很接近,但是一旦我把標點捕捉移動到句尾,它就會陷入困境。正則表達式來匹配帶小數點和名字的句子
這句話的場景都低於:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. This is a sentence with odd spacing. This is one with lots of exclamation marks at the end!!!!This is another with a decimal 10.00 in the middle. Why is it so hard to find sentence endings?Last sentence without a space at the start.
這將導致捕獲:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it.
This is a sentence with odd spacing.
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle.
Why is it so hard to find sentence endings?
Last sentence without a space at the start.
這是表達我有:
.*?(?:[!?.;]+)((?<!(Mr|Mrs|Dr|Rev).?)(?=\D|\s+|$)(?:[^!?.;\d]|\d*\.?\d+)*)(?=(?:[!?.;]+))
有兩種問題如下:
- 標點符號處於起步
- 它正確地處理每個句子的一個名稱,但不是兩個(獎勵積分我想它正確捕獲「DJ史密斯先生」,但我不能工作如何止跌不匹配以單個字母結尾的句子。
進入這個的數據會有一些規範化,所以我們知道它會以句號結尾並且在一行中,但任何指針都是可以接受的。
在正則表達式自然語言解析器?我可以放心地說,你永遠不會寫出封裝所有標點符號規則的正則表達式。再想一想。 – spender 2014-09-21 12:02:36
在此之前我們已經有了一個NLP步驟,所以這是另一個想法(通過例外)歡呼。 – Tim 2014-09-21 12:31:10