用下面的代碼(有點亂,我承認)我用逗號分隔了一個字符串,但條件是當它不分隔時字符串中包含逗號分隔的單個詞,例如: 它沒有分開"Yup, there's a reason why you want to hit the sack just minutes after climax"
,但它分離成"The increase in heart rate, which you get from masturbating, is directly beneficial to the circulation, and can reduce the likelihood of a heart attack"
['The increase in heart rate', 'which you get from masturbating', 'is directly beneficial to the circulation', 'and can reduce the likelihood of a heart attack']
的問題是當它與這樣的字符串遇到代碼的目的失敗:"When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow."
import os
import textwrap
import re
import io
from textblob import TextBlob
string = str(input_string)
listy= [x.strip() for x in string.split(',')]
listy = [x.replace('\n', '') for x in listy]
listy = [re.sub('(?<!\d)\.(?!\d)', '', x) for x in listy]
listy = filter(None, listy) # Remove any empty strings
newstring= []
for segment in listy:
wc = TextBlob(segment).word_counts
if listy[len(listy)-1] != segment:
if len(wc) > 3: # len(segment.split(' ')) > 7:
sep = [x.strip() for x in (' '.join(newstring)).split('&&')]
儘管我相信正確的英文用法是'a,b和c'而不是'a,b和c'。因此,如果適當的英語然後只是',(?!\ s + \ w +,)'會起作用。 – kaza
當然,謝謝你的詳細解答。 Upvoting你。 –
優秀的答案。 –