反向正則表達式在Python

這是一個奇怪的問題，我知道...我有一個正則表達式，如：反向正則表達式在Python

rex = r"at (?P<hour>[0-2][0-9]) send email to (?P<name>\w*):? (?P<message>.+)"

，所以如果我匹配這樣的：

match = re.match(rex, "at 10 send email to bob: hi bob!")

match.groupdict()給我本字典：

{"hour": "10", "name": "bob", "message": "hi bob!"}

我的問題是：鑑於上面的字典和雷克斯，我可以做一個函數，返回原文是什麼？我知道許多文本可以匹配相同的字典（在這種情況下，名稱後的'：'是可選的），但我想要一個與輸入中的字典匹配的無限文本。

來源

2014-04-23 Matteo

它真的是無限的嗎？除了可選的'：'之外，其他的東西都是固定的？ –

使用'match.group（）'（a.k.a.'match.group（0）'），沒有。您丟棄了信息（特別是原始字符串是否包含冒號），因此無法從捕獲的組的內容中明確重構原始字符串。唯一的方法是爲冒號添加一個捕獲組，然後可以使用它來確定輸入文本是否包含冒號。 – Mac

我給了一個不正確的答案...的一點是，正則表達式失去一些數據，因此，如果你想恢復需要捕獲整個數據（在不同的令牌）輸入 – Emilien

這是將符合正則表達式的文本之一：

'at {hour} send email to {name}: {message}'.format(**match.groupdict())'

來源

2014-04-23 09:33:55

或者，更地道，'match.expand（r'at一個\ g 發送電子郵件以一個\ g ：一個\ g 「）'' – Mac

\ g'沒有''\\，但沒錯，就是更地道。 –

但OP說'使用groupdict'。如果比賽是可用的，你可以只是做'match.group（）' –

使用inverse_regex：

""" 
http://www.mail-archive.com/[email protected]/msg125198.html 
""" 
import itertools as IT 
import sre_constants as sc 
import sre_parse 
import string 

# Generate strings that match a given regex 

category_chars = { 
    sc.CATEGORY_DIGIT : string.digits, 
    sc.CATEGORY_SPACE : string.whitespace, 
    sc.CATEGORY_WORD : string.digits + string.letters + '_' 
    } 

def unique_extend(res_list, list): 
    for item in list: 
     if item not in res_list: 
      res_list.append(item) 

def handle_any(val): 
    """ 
    This is different from normal regexp matching. It only matches 
    printable ASCII characters. 
    """ 
    return string.printable 

def handle_branch((tok, val)): 
    all_opts = [] 
    for toks in val: 
     opts = permute_toks(toks) 
     unique_extend(all_opts, opts) 
    return all_opts 

def handle_category(val): 
    return list(category_chars[val]) 

def handle_in(val): 
    out = [] 
    for tok, val in val: 
     out += handle_tok(tok, val) 
    return out 

def handle_literal(val): 
    return [chr(val)] 

def handle_max_repeat((min, max, val)): 
    """ 
    Handle a repeat token such as {x,y} or ?. 
    """ 
    subtok, subval = val[0] 

    if max > 5000: 
     # max is the number of cartesian join operations needed to be 
     # carried out. More than 5000 consumes way to much memory. 
     # raise ValueError("To many repetitions requested (%d)" % max) 
     max = 5000 

    optlist = handle_tok(subtok, subval) 

    iterlist = [] 
    for x in range(min, max + 1): 
     joined = IT.product(*[optlist]*x) 
     iterlist.append(joined) 

    return (''.join(it) for it in IT.chain(*iterlist)) 

def handle_range(val): 
    lo, hi = val 
    return (chr(x) for x in range(lo, hi + 1)) 

def handle_subpattern(val): 
    return list(permute_toks(val[1])) 

def handle_tok(tok, val): 
    """ 
    Returns a list of strings of possible permutations for this regexp 
    token. 
    """ 
    handlers = { 
     sc.ANY  : handle_any, 
     sc.BRANCH  : handle_branch, 
     sc.CATEGORY : handle_category, 
     sc.LITERAL : handle_literal, 
     sc.IN   : handle_in, 
     sc.MAX_REPEAT : handle_max_repeat, 
     sc.RANGE  : handle_range, 
     sc.SUBPATTERN : handle_subpattern} 
    try: 
     return handlers[tok](val) 
    except KeyError, e: 
     fmt = "Unsupported regular expression construct: %s" 
     raise ValueError(fmt % tok) 

def permute_toks(toks): 
    """ 
    Returns a generator of strings of possible permutations for this 
    regexp token list. 
    """ 
    lists = [handle_tok(tok, val) for tok, val in toks] 
    return (''.join(it) for it in IT.product(*lists)) 



########## PUBLIC API #################### 

def ipermute(p): 
    return permute_toks(sre_parse.parse(p))

您可以應用給rex和data，然後替換使用inverse_regex.ipermute生成匹配原始正則表達式的字符串：

import re 
import itertools as IT 
import inverse_regex as ire 

rex = r"(?:at (?P<hour>[0-2][0-9])|today) send email to (?P<name>\w*):? (?P<message>.+)" 
match = re.match(rex, "at 10 send email to bob: hi bob!") 
data = match.groupdict() 
del match 

new_regex = re.sub(r'[(][?]P<([^>]+)>[^)]*[)]', lambda m: data.get(m.group(1)), rex) 
for s in IT.islice(ire.ipermute(new_regex), 10): 
    print(s)

產生

today send email to bob hi bob! 
today send email to bob: hi bob! 
at 10 send email to bob hi bob! 
at 10 send email to bob: hi bob!

注：我修改了原來inverse_regex到不引發ValueError時，正則表達式中包含*秒。相反，*更改爲有效像{,5000}，所以你至少會得到一些排列。

來源

2014-04-23 09:37:40 unutbu

非常感謝，這就是我一直在尋找的！但我有一個問題，這不處理嵌套括號，例如（？：在（？P [0-2] [0-9]）今天）..有解決方案嗎？ – Matteo

這是因爲這個reg。恩。「[（] [？] P <([^>] +）> [^）] * []]」在找到'''時停止，但在中間它是'（？....'，任何跳過很多'）''''''''''在中間嗎？ – Matteo

這是一個非常有趣的問題。您可能能夠處理嵌套圓括號[使用（或修改）]（http://stackoverflow.com/a/ 23185606/190597），但我沒有現成的答案，作爲一種遊戲形式，我喜歡這樣的問題，但從實際角度來看，我想也許你可能會追求一個[XY問題]（http：///meta.stackexchange.com/q/66377/137631） - 如果你問了一個關於你更大目標的問題，這裏的某個人可能會提出一個避免這種複雜情況的策略。 – unutbu

反向正則表達式在Python

回答

相關問題