我有這個字符串,我想拆就時期:如何在分隔符分割字符串,但排除其他字符串
j = 'you can get it cheaper than $20.99. shop at amazon.com. hurry before prices go up.'
這是結果,我想:
['you can get it cheaper than $20.99. ', 'shop at amazon.com.', ' hurry before prices go up.']
我在每個小寫字母前面加上一個句點,後面跟着句號和空格。
x = []
sentences = re.split(r'([a-z]\.|\d\.\s)', j)
sentence_endings = sentences[1::2]
for position in range(len(sentences)):
if sentences[position] in sentence_endings:
x.append(sentences[position -1] + sentences[position])
打印X給我:
['you can get it cheaper than $20.99. ', 'shop at amazon.', 'com.', ' hurry before prices go up.']
我想「amazon.com」是一個字符串,所以我指示正則表達式忽略「.COM」與re.split(r'([a-z]\.|\d\.\s)[^.com]', j)
但不讓我得到我想要的結果。什麼是最好的方法來做到這一點?
're.split(r'(?<= \。)\ s',s)' –