2017-09-16 104 views
0

用下面的代碼(有點亂,我承認)我用逗號分隔了一個字符串,但條件是當它不分隔時字符串中包含逗號分隔的單個詞,例如: 它沒有分開"Yup, there's a reason why you want to hit the sack just minutes after climax",但它分離成"The increase in heart rate, which you get from masturbating, is directly beneficial to the circulation, and can reduce the likelihood of a heart attack"['The increase in heart rate', 'which you get from masturbating', 'is directly beneficial to the circulation', 'and can reduce the likelihood of a heart attack']用逗號分隔字符串,但有條件(忽略用逗號分隔的單個詞)

的問題是當它與這樣的字符串遇到代碼的目的失敗:"When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow."我不想催產素後分離,但催乳素後。我需要一個正則表達式來做到這一點。

import os 
import textwrap 
import re 
import io 
from textblob import TextBlob 


string = str(input_string) 

listy= [x.strip() for x in string.split(',')] 
listy = [x.replace('\n', '') for x in listy] 
listy = [re.sub('(?<!\d)\.(?!\d)', '', x) for x in listy] 
listy = filter(None, listy) # Remove any empty strings  

newstring= [] 

for segment in listy: 

    wc = TextBlob(segment).word_counts 

    if listy[len(listy)-1] != segment: 

     if len(wc) > 3: # len(segment.split(' ')) > 7: 
      newstring.append(segment+"&&") 
     else: 
      newstring.append(segment+",") 

    else: 

     newstring.append(segment) 

sep = [x.strip() for x in (' '.join(newstring)).split('&&')] 

回答

1

考慮以下..

mystr="When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow." 

rExp=r",(?!\s+(?:and\s+)?\w+,)" 
mylst=re.compile(rExp).split(mystr) 
print(mylst) 

應該給下面的輸出..

['When men ejaculate', ' it releases a slew of chemicals including oxytocin, vasopressin, and prolactin', ' all of which naturally help you hit the pillow.'] 

讓我們來看看我們是如何分割字符串...

,(?!\s+\w+,) 

使用每個逗號((?! - >否定展望)\s+\w+,空格和一個逗號詞。
以上將在vasopressin, and的情況下失敗,因爲and之後沒有跟着,。所以在內部引入條件and\s+

,(?!\s+(?:and\s+)?\w+,) 

雖然我可能要使用下面

,(?!\s+(?:(?:and|or)\s+)?\w+,) 

測試正則表達式here
測試代碼here

的本質考慮更換您的行

listy= [x.strip() for x in string.split(',')] 

listy= [x.strip() for x in re.split(r",(?!\s+(?:and\s+)?\w+,)",string)] 
+0

儘管我相信正確的英文用法是'a,b和c'而不是'a,b和c'。因此,如果適當的英語然後只是',(?!\ s + \ w +,)'會起作用。 – kaza

+0

當然,謝謝你的詳細解答。 Upvoting你。 –

+0

優秀的答案。 –