在任何列表上正則表達式的推廣

我正在處理一個包含整數的大列表，並且我想對它們進行一些模式匹配（如查找某些序列）。正則表達式將是最合適的，除了它們似乎總是隻處理字符列表，例如字符串。是否有任何可以處理任意類型列表的庫（使用任何語言）？在任何列表上正則表達式的推廣

我知道我可以將我的整數列表轉換爲一個字符串，然後做一個正常的正則表達式搜索，但這似乎有點浪費和不雅。

編輯：

我的要求是相當簡單的。不需要嵌套列表，不需要花哨的字符類。基本上我只是感興趣的序列可能會變得非常複雜。（例如類似"[abc]{3,}.([ab]?[^a]{4,7})"等，其中a,b,c是整數）。這應該可以推廣到任何可以檢查平等的類型。對於枚舉類型，您還可以使用"[a-z]"等工作。

來源

2010-11-29 pafcu

你能發表一些例子嗎？ – 2010-11-29 11:32:07

對於有趣的問題+1。我看了一下，很失望地看到Scala'a正則表達式只能用於Strings--標準庫通常是相當普遍的。我期待着一些答案 – 2010-11-29 11:37:20

整數有多大？如果是21位或更低，則可以將其轉換爲Unicode代碼點。 – leppie 2010-11-29 12:05:15

正則表達式僅匹配字符串，by definition。

當然，理論上你可以構造一個等價的語法，比如數字列表。隨着新的令牌一樣\e爲偶數，\o爲奇數，\s的平方數，\r實數等，這樣

[1, 2, 3, 4, 5, 6]

將由

^(\o\e)*$

或

匹配

[ln(3), math.pi, sqrt(-1)]

將匹配

^\R*$

等等聽起來像一個有趣的項目，但也像一個非常困難的。而且，如何擴展它來處理任意列表，嵌套和全部，都超出了我的想象。

來源

2010-11-29 11:59:51

如果你真的需要一個像正則表達式那樣的免費語法，那麼你必須按照Tim的回答中所述的方式進行操作。如果您只有固定數量的模式可供搜索，那麼最簡單快捷的方法就是編寫您自己的搜索/過濾功能。

來源

2010-11-29 12:09:12 splash

確實有趣的問題。

橫向思維：下載.Net Framework Source code，解除正則表達式源代碼並使其適應整型而不是字符。

只是一個想法。

來源

2010-11-29 12:54:57 smirkingman

一些正則表達式語法泛化爲泛型序列。另外，爲了能夠指定任何對象，字符串不是表達式本身的最佳媒介。

在蟒「小」的例子：

def choice(*items): 
    return ('choice',[value(item) for item in items]) 

def seq(*items): 
    return ('seq',[value(item) for item in items]) 

def repeat(min,max,lazy,item): 
    return ('repeat',min,max,lazy,value(item)) 

def value(item): 
    if not isinstance(item,tuple): 
    return ('value',item) 
    return item 

def compile(pattern): 
    ret = [] 
    key = pattern[0] 
    if key == 'value': 
    ret.append(('literal',pattern[1])) 
    elif key == 'seq': 
    for subpattern in pattern[1]: 
     ret.extend(compile(subpattern)) 
    elif key == 'choice': 
    jumps = [] 
    n = len(pattern[1]) 
    for i,subpattern in enumerate(pattern[1]): 
     if i < n-1: 
     pos = len(ret) 
     ret.append('placeholder for choice') 
     ret.extend(compile(subpattern)) 
     jumps.append(len(ret)) 
     ret.append('placeholder for jump') 
     ret[pos] = ('choice',len(ret)-pos) 
     else: 
     ret.extend(compile(subpattern)) 
    for pos in jumps: 
     ret[pos] = ('jump', len(ret)-pos) 
    elif key == 'repeat': 
    min,max,lazy,subpattern = pattern[1:] 
    for _ in xrange(min): 
     ret.extend(compile(subpattern)) 
    if max == -1: 
     if lazy: 
     pos = len(ret) 
     ret.append('placeholder for jump') 
     ret.extend(compile(subpattern)) 
     ret[pos] = ('jump',len(ret)-pos) 
     ret.append(('choice',pos+1-len(ret))) 
     else: 
     pos = len(ret) 
     ret.append('placeholder for choice') 
     ret.extend(compile(subpattern)) 
     ret.append(('jump',pos-len(ret))) 
     ret[pos] = ('choice',len(ret)-pos) 
    elif max > min: 
     if lazy: 
     jumps = [] 
     for _ in xrange(min,max): 
      ret.append(('choice',2)) 
      jumps.append(len(ret)) 
      ret.append('placeholder for jump') 
      ret.extend(compile(subpattern)) 
     for pos in jumps: 
      ret[pos] = ('jump', len(ret)-pos) 
     else: 
     choices = [] 
     for _ in xrange(min,max): 
      choices.append(len(ret)) 
      ret.append('placeholder for choice') 
      ret.extend(compile(subpattern)) 
      ret.append(('drop,')) 
     for pos in choices: 
      ret[pos] = ('choice',len(ret)-pos) 
    return ret 

def match(pattern,subject,start=0): 
    stack = [] 
    pos = start 
    i = 0 
    while i < len(pattern): 
    instruction = pattern[i] 
    key = instruction[0] 
    if key == 'literal': 
     if pos < len(subject) and subject[pos] == instruction[1]: 
     i += 1 
     pos += 1 
     continue 
    elif key == 'jump': 
     i += instruction[1] 
     continue 
    elif key == 'choice': 
     stack.append((i+instruction[1],pos)) 
     i += 1 
     continue 
    # fail 
    if not stack: 
     return None 
    i,pos = stack.pop() 
    return pos 

def find(pattern,subject,start=0): 
    for pos1 in xrange(start,len(subject)+1): 
    pos2 = match(pattern,subject,pos1) 
    if pos2 is not None: return pos1,pos2 
    return None,None 

def find_all(pattern,subject,start=0): 
    matches = [] 
    pos1,pos2 = find(pattern,subject,start) 
    while pos1 is not None: 
    matches.append((pos1,pos2)) 
    pos1,pos2 = find(pattern,subject,pos2) 
    return matches 

# Timestamps: ([01][0-9]|2[0-3])[0-5][0-9] 
pattern = compile(
    seq(
    choice(
     seq(choice(0,1),choice(0,1,2,3,4,5,6,7,8,9)), 
     seq(2,choice(0,1,2,3)), 
    ), 
    choice(0,1,2,3,4,5), 
    choice(0,1,2,3,4,5,6,7,8,9), 
) 
) 

print find_all(pattern,[1,3,2,5,6,3,4,2,4,3,2,2,3,6,6,5,3,5,3,3,2,5,4,5]) 
# matches: (0,4): [1,3,2,5]; (10,14): [2,2,3,6]

改進的幾點：

更多結構：類與否定，範圍
類，而不是元組

來源

2010-11-29 14:07:12

那麼，Erlang的內置模式匹配（你的類型）。我做了類似的onc在Ruby中 - 有點可能不是太好的表現hackery，請參閱http://radiospiel.org/0x16-its-a-bird

來源

2010-11-30 01:01:00 radiospiel

-2

您可以嘗試pamatcher，它是一個JavaScript庫，用於概括任何項目（任何類型）的正則表達式的概念。

爲一個例子。「[ABC] {3，}（？[AB] [^ A] {4,7-}）」圖案匹配，其中a，b，c是整數：

var pamatcher = require('pamatcher'); 

var a = 10; 
var b = 20; 
var c = 30; 

var matcher = pamatcher([ 
    { repeat: 
     { or: [a, b, c] }, 
    min: 3 
    }, 
() => true, 
    { optional: 
    { or: [a, b] } 
    }, 
    { repeat: (i) => i != a, 
    min: 4, 
    max: 7 
    } 
]); 

var input = [1, 4, 8, 44, 55]; 
var result = matcher.test(input); 
if(result) { 
    console.log("Pattern matches!"); 
} else { 
    console.log("Pattern doesn't match."); 
}

注意：我是這個圖書館的創建者。

來源

2015-12-03 13:07:28 pmros

在任何列表上正則表達式的推廣

回答

相關問題