設置正則表達式一個複雜的字符串

我有一個這樣的產品成分的字符串：設置正則表達式一個複雜的字符串

text = 'Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings'

我想從它檢測到的所有文本（成分），使得它看起來應該是這樣。

ingredientsList= ['Pork and beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']

我使用這裏的正則表達式當前是：

ingredients = re.findall(r'\([^()]*\)|([^\W\d]+(?:\s+[^\W\d]+)*)', text)

但它不提供在支架上的文字。我只是不想包括代碼和百分比，但想要括號內的所有成分。我應該在這裏做什麼？提前致謝。

來源

2016-10-26 muazfaiz

您可能會限制第一分支與E開始，後邊帶有數字只匹配代碼：

\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)

見regex demo

現在，\(E\d+\)將只匹配(Exxx)樣子，和其他人將被處理。您也可以在這裏添加百分比，以明確跳過它們 - \((?:E\d+|\d+(?:[.,]\d+)?%)\)。

Python demo：

import re 
rx = r"\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)" 
s = "Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings" 
res = [x for x in re.findall(rx, s) if x] 
print(res)

來源

2016-10-26 10:04:37

設置正則表達式一個複雜的字符串

回答

相關問題