2013-12-09 135 views
1
拉句子的關鍵詞組合,蟒蛇

假設我有串使用正則表達式

'apples are red. this apple is green. pears are sometimes red, but not usually. pears are green. apples are yummy. lizards are green.' 

,我想使用正則表達式來拉在該字符串首先然後將其提任蘋果或梨的句子顏色,紅色或綠色。所以我基本上要以列表的有:

["apples are red.", "this apple is green.", "pears are sometimes red, but not usually.", pears are green."] 

我可以拉一個正則表達式只是蘋果和梨或綠色和紅色的東西,如

re.findall(r'([^.]*?apple[^.]*|[^.]*?pear[^.]*)', string) 

re.findall(r'([^.]*?red[^.]*|[^.]*?green[^.]*)', string) 

但是如果我想讓水果(蘋果/梨)在字符串中排在第一位,然後是顏色,稍後再指出句子,我該如何將這兩者放在一起?

+0

查找通過兩個調用找到所有句子(交叉點)。他們將是符合兩個標準的句子。 –

+0

@HunterMcMillen:雖然這不能確保一個元素先於另一個元素匹配。 –

+0

@TimPietzcker非常真實。 –

回答

0

您可以使用parentheses分組子表達式:

re.findall(r"[^.]*\b(?:apple|pear)[^.]*\b(?:red|green)\b[^.]*\.", string) 

例如:

>>> import re 
>>> a = 'apples are red. this apple is green. pears are sometimes red, but not usually. pears are green. apples are yummy. lizards are green.' 
>>> re.findall(r"[^.]*\b(?:apple|pear)[^.]*\b(?:red|green)\b[^.]*\.", a) 
['apples are red.', ' this apple is green.', 
' pears are sometimes red, but not usually.', ' pears are green.'] 
0

使用這種模式(?:^|\b)(?=[^.]*(?:apple|pear)[^.]*(?:red|green))([^.]+\.)Demo

0

我建議你閱讀NLTK(自然語言工具工具包)。這是一個用於文本處理的Python包