如何提取由方braquet獨立的子串並生成子

我想提取和識別相匹配的模式包含方braquets內的子後構建一些字符串：如何提取由方braquet獨立的子串並生成子

例如：如果我的文字是「 2杯[9盎司] [10 G]麪粉'

我要生成4串出這個輸入的：

「2杯」 - >我們

「9盎司」 - >英國帝國

「10 G」 - >度量

「麪粉」 - >成份名稱

作爲一個開端我已經開始識別任何包含盎司關鍵字的方形護腕，並且寫下了下面的代碼，但是沒有發生匹配。任何想法和最佳實踐來實現這一目標？

p_oz = re.compile(r'\[(.+) oz\]', re.IGNORECASE) # to match uk metric 
    text = '2 cups [9 oz] flour' 

    m = p_oz.match(text) 

    if m: 
     found = m.group(1) 
     print found

來源

2012-06-26 tiguero

您需要使用search而不是match。

m = p_oz.search(text)

re.match嘗試將整個輸入字符串與正則表達式匹配。這不是你想要的。你想找到一個匹配你的正則表達式的子字符串，這就是re.search的用途。

來源

2012-06-26 16:42:26 BrenBarn

我用一場比賽來代替，但忘了把它 - 對不起 – tiguero

始終複製並粘貼您的代碼，以確保您發佈了實際使用的代碼。 – BrenBarn

我只是擴展了BrenBarn接受的答案。午餐期間我想解決一個很好的問題。下面是我的全面實施你的問題：

鑑於串2 cups [9 oz] [10 g] flour

import re 

text = '2 cups [9 oz] [10 g] flour' 

units = {'oz': 'uk imperical', 
     'cups': 'us', 
     'g': 'metric'} 

# strip out brackets & trim white space 
text = text.replace('[', '').replace(']', '').strip() 

# replace numbers like 9 to "9 
text = re.sub(r'(\d+)', r'"\1', text) 

# expand units like `cups` to `cups" -> us` 
for unit in units: 
    text = text.replace(unit, unit + '" -> ' + units[unit] + "~") 

# matches the last word in the string 
text = re.sub(r'(\w+$)', r'"\1" -> ingredient name', text) 

print "raw text: \n" + text + "\n" 
print "Array:" 
print text.split('~ ')

將返回一個字符串數組：

raw text: 
"2 cups" -> us~ "9 oz" -> uk imperical~ "10 g" -> metric~ "flour" -> ingredient name 

Array: [ 
'"2 cups" -> us', 
'"9 oz" -> uk imperical', 
'"10 g" -> metric', 
'"flour" -> ingredientname' 
]

來源

2012-06-26 18:48:36

謝謝 - 我現在試圖得到一個答案，現在我的第一個方法 – tiguero

是的，我會推薦你一直在努力與我在午餐時間扔在一起的這個小片段。祝你好運！ –

如何提取由方braquet獨立的子串並生成子

回答

相關問題