2017-08-01 102 views
2

我有以下字符串:蟒蛇分裂

(一些文本)或((其它文本)和(一些文字))和(仍更多的文字)

我想一個python正則表達式,將其分解成

['(some text)', '((other text) and (some more text))', '(still more text)'] 

我已經試過,但它不工作:

haystack = "(some text) or ((other text) and (some more text)) and (still more text)" 
re.split('(or|and)(?![^(]*.\))', haystack) # no worky 

任何幫助表示讚賞。

+5

正則表達式不能很好地處理任意嵌套的內容。除了您向我們展示的示例之外,可能會有更多層嵌套括號。對於這種情況,使用解析器可能會比正則表達式更進一步。 –

+2

這可能有所幫助:https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses –

+0

這可能也是有用的:https://stackoverflow.com/questions/4284991/parsing-nested-parentheses-in-python-grab-content-by-level – perigon

回答

1

我會用re.findall代替re.split。而且注意,這隻會工作高達深度的括號2

>>> import re 
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)' 
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s) 
['(some text)', '((other text) and (some more text))', '(still more text)'] 
>>> 
+0

是的。我添加了一個註釋.. –

+0

我試圖簡化我的字符串,並且它反彈。您的解決方案不適用於我的真實字符串... (substringof('needle',name))或((role eq'needle')and(substringof('needle',email)))或(job eq'needle ')或(office eq'針') –

+0

@ user1571934請提供確切的字符串.. –

0

你可以試試這個 re.split( '[A-F] +', '0a3B9',旗幟= re.IGNORECASE)

2

該解決方案適用於任意嵌套的括號,其中一個正則表達式不能(s是原始字符串):

from pyparsing import nestedExpr 
def lst_to_parens(elt): 
    if isinstance(elt,list): 
     return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')' 
    else: 
     return elt 

split = nestedExpr('(',')').parseString('(' + s + ')').asList() 
split_lists = [elt for elt in split[0] if isinstance(elt,list)] 
print ([lst_to_parens(elt) for elt in split_lists]) 

輸出:

['(some text)', '((other text) and (some more text))', '(still more text)'] 

對於OP真實的測試案例:

s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')" 

輸出:

["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"] 
1

您還可以檢查此

import re 
s = '(some text) or ((other text) and (some more text)) and (still more text)' 
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s) 
print(find_string) 

輸出:

['(some text)', '((other text) and (some more text))', '(still more text)'] 

編輯

find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s) 
+0

這不是匹配括號的正確方法..如果在兩個開放括號之間存在任何文本會怎麼樣? –

+0

@AvinashRaj,請給我一個樣本字符串?謝謝。 –

+0

用這個''(一些文本)或((其他文本)和(一些更多的文本))和(更多文本)'字符串檢查你的正則表達式。 –