2016-10-08 37 views
1

我需要編寫一個正則表達式來代替'.'','在某些患者對藥物的評論中。他們在提到副作用後應該使用逗號,但其中一些使用了點。例如:正則表達式用客戶意見中的逗號替換一些點

text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it." 

我寫一個正則表達式的代碼來檢測一個字(如,頭痛)或兩個單詞(如,壞的夢)由兩個點包圍:

檢測由包圍的字兩個點:

text= re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text) 

檢測兩個詞用兩個點所包圍:

text = re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11) 

這是輸出:

the drug side-effects are: night mare, nausea, night sweat. bad dream, dizziness, severe headache. I suffered, she suffered. she told I should change it. 

但它應該是:night sweat to ','

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it. 

我的代碼並沒有取代dot。另外,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)。我不知道如何將這個條件添加到我的代碼中。

有什麼建議嗎?謝謝 !

+0

請參閱https://regex101.com/r/awW1Hc/1,這是你想達到什麼目的?你將不得不硬編碼代詞,沒有辦法。 –

+0

@ Sebastian Proske,謝謝!完美的作品! – Mary

回答

1

您可以使用以下模式:

\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$) 

這點相匹配,然後捕獲一兩句話,其中第一個是沒有你提到的代詞(你將需要展開列表最有可能的) 。這後面跟着一個既不是單詞也不是空格的字符(例如.!:,)或字符串的結尾。

這樣您就可以與,\1

來取代它在蟒蛇

import re 
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it." 
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I) 
print(text) 

輸出

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it. 

這很可能不是絕對的故障安全,你可能需要擴大一些模式邊緣情況。

相關問題