將模式編譯爲列表

所以我目前正在將一些代碼從R轉移到Python。該文件，我加載處理的grep是格式如下：將模式編譯爲列表

match            id 
(chief\s+marketing\s+officer)|(\bc\.?m\.?o\.?\b) 3 
(chief\s+technology\s+officer)|(\bc\.?t\.?o\.?\b) 4 
(chief\s+information\s+officer)|(\bc\.?i\.?o\.?\b) 5 
(\bdirector\b)          11

我有問題，我加載此爲大熊貓數據幀，並且預編譯的模式。

def compilePatterns(): 
    matches = levels['match'] 
    patterns = [] 
    for match in matches: 
     pat = re.compile(r''+ match) 
     patterns.append(pat) 
    return patterns

所以，現在，用我的提取功能：

def extract(title): 
    title = title.lower() 
    print title 
    for index,pattern, in enumerate(patterns): 
     match = pattern.match(title) 
     if match: 
      return levels.iloc[index]['id'] 
    return None

它運作良好，如果我不提取物（「董事」），我得到10，但如果我這樣做：提取物（「寵物導演」）它返回None。因此導演從未被拾起。

我不確定問題出在我編譯模式時，因爲他們到處都有括號，或者這是一個正確的方法。

來源

2014-01-22 redrubia

pattern.match將只返回字符串的開始的匹配項。由於\bdirector\b未出現在字符串'Pet director'的開頭，因此pattern.match('Pet director')不會返回任何內容。

你想要的是pattern.search（或re.search(pattern, ...)），這將返回被發現的字符串中的任何比賽。

來源

2014-01-22 18:39:05 senshin

非常感謝！是的，我現在意識到，一場比賽是在一開始。似乎搜索已清除我的問題。再次感謝 – redrubia

將模式編譯爲列表

回答

相關問題