正則表達式，其中部分模式是可選的。

我正在使用填字遊戲編譯器。作爲一個例子，假設有8個空白點，第二個點是'U'，第四個點是'E'，第六個點是'E'。正則表達式，其中部分模式是可選的。

_U_E_E___

我有我正努力讓這種匹配該單詞的單詞列表。如果我找到這個模式的8個字母的單詞（TUBELESS），太棒了！但是如果我能找到一個只有前4個插槽（TUBE）的4個字母的單詞，我也可以使用它。

我可以有一個RE爲每個可能的長度，並結合他們使用'|'但我正在尋找更優雅的解決方案。幫幫我？

2013-10-24 Bharat B

使用嵌套可選組：.U.E(?:.(?:E(?:..?)?)?)?$

您可以使用一個簡單的遞歸函數來構建模式：（幾乎相同的模式，但即使是最後一個字符得到裹在一組）

def nested_pattern(s): 
    if s: 
     return '(?:' + s[0] + nested_pattern(s[1:]) + ')?' 
    else: 
     return '' 

import re 
regex = re.compile(r'.U.E' + nested_pattern(r'.E..') + '$') 

for word in ('TUB', 'TUBE', 'TEBU', 'TUBES', 'PURETE', 'TUBELEX', 'TUBELESS', 'SURELY'): 
    print word, bool(regex.match(word))

打印

TUB False 
TUBE True 
TEBU False 
TUBES True 
PURETE True 
TUBELEX True 
TUBELESS True 
SURELY False

來源

2013-10-24 06:05:22

4至8字符串你想要的比賽是：

>>> p = re.compile('^[A-Z]U[A-Z]E(?=[A-Z](?=E(?=[A-Z](?=[A-Z]$|$)|$)|$)|$)') 
>>> re.match(p, 'TUB') 
>>> re.match(p, 'TUBE') 
<_sre.SRE_Match object at 0x10fe55ac0> 
>>> re.match(p, 'TUBX') 
>>> re.match(p, 'TUBEL') 
<_sre.SRE_Match object at 0x10fe55b28> 
>>> re.match(p, 'TUBELE') 
<_sre.SRE_Match object at 0x10fe55ac0> 
>>> re.match(p, 'TUBELEX') 
<_sre.SRE_Match object at 0x10fe55b28> 
>>> re.match(p, 'TUBELES') 
<_sre.SRE_Match object at 0x10fe55ac0> 
>>> re.match(p, 'TUBELESS') 
<_sre.SRE_Match object at 0x10fe55b28> 
>>> re.match(p, 'TUBELESSY') 
>>> re.match(p, 'TUBELESS7') 
>>> re.match(p, 'TUBELEZZ') 
<_sre.SRE_Match object at 0x10fe55ac0> 
>>> re.match(p, 'TUBELE88')

我不知道這是否是「更優雅」，但它是向前看的一個有趣的例證。也許它會爲你產生一些想法？

來源

2013-10-24 05:47:17

這確實是一個有趣的使用前瞻，並將做我想做的事。但是我必須和Janne Karila一起去更全面的回答。謝謝。 –

text = "_U_E_E___" 
def solve(text, word_list): 
    for word in word_list: 
     matches = 0 
     for c1, c2 in zip(text, word): 
      if not(c1 == c2 or c1 == '_'): 
       break 
      matches += 1 
     if matches >= 4: 
      return word 


print solve(text, ['TXBELESS', 'TUBE']) 
print solve(text, ['TXBELESS', 'TUBx', 'TUBELESS', 'TUBEL'])

輸出：

TUBE 
TUBELESS

來源

2013-10-24 05:50:14

雖然這段代碼可以工作，但我正在尋找一個正則表達式。 –

這裏有一個稍微簡潔的正則表達式。我假設字典中的單詞不會有數字，所以匹配的字母數字字符不會成爲問題。如果不是這種情況，只需在表達式中將\w替換爲[A-Z]即可。

import re 

#REGEX EDIT: 
#added so that the expression can't be embedded in another string 
#^= beginning, $ = end 

#to match words that are either 4 or 8 characters long: 
#specify a group of 4 letters at the end, then match it 0 or 1 times with "?" 
regex = re.compile(r"^\wU\wE(\wE\w{2})?$") 

x = 'TUBELESS' 
y = 'TUBE' 

#both these options return a match object 
#meaning they fit the regular expression 
regex.match(x) 
regex.match(y)

來源

2013-10-24 06:00:02 xgord

Ref：http://www.pythonregex.com/'TUBE'和'TUBEXXX'都會返回匹配對象與給定的RE，這將不適合我要找的東西。謝謝。 –

你是對的。我忘記匹配函數如果在字符串中的任何位置找到表達式，則返回true。我添加了開始和結束字符以使其工作。 – xgord

正則表達式，其中部分模式是可選的。

回答

相關問題