檢查部分匹配1列表與部分匹配另一個列表 - 可能與列表理解？

這裏有點蟒蛇/編程新手。檢查部分匹配1列表與部分匹配另一個列表 - 可能與列表理解？

我已經寫代碼，做什麼，我需要它：

import re 
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ] 
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros'] 

def pipe1(syn): 
    # Find first word/phrase in list element up to and including the 1st pipe 
    r = r'.*?\|' 
    m = re.match(r, syn) 
    m = m.group() 
    return m 

def find_non_match(): 
    # Compare 'new_syns' with 'syns' and create new list from non-matches in 'new_syns' 
    p = '@#&' # Place holder created 
    joined = p.join(syns) 
    joined = p + joined # Adds place holder to beginning of string too 
    non_match = [] 
    for syn in new_syns: 
     m = pipe1(syn) 
     m = p + m 
     if m not in joined: 
      non_match.append(syn) 
    return non_match 

print find_non_match()

打印輸出：

['winter-time|winter|winter season']

代碼檢查，如果詞/短語直至幷包括第一管對於new_syns中的每個元素都是syns列表中的相同部分匹配的匹配項。代碼的目的是實際找到不匹配的內容，然後將它們附加到名爲non_match的新列表中。

但是，我想知道是否可以達到相同的目的，但使用列表理解的行數要少得多。我嘗試了，但我沒有得到我想要的東西。這是我想出迄今：

import re 
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ] 
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros'] 

def pipe1(syn): 
    # Find first word/phrase in list element up to and including the 1st pipe 
    r = r'.*?\|' 
    m = re.match(r, syn) 
    m = '@#&' + m.group() # Add unusual symbol combo to creatte match for beginning of element 
    return m 

non_match = [i for i in new_syns if pipe1(i) not in '@#&'.join(syns)] 
print non_match

打印輸出：

['winter-time|winter|winter season', 'professionals|pros'] # I don't want 'professionals|pros' in the list

列表理解需要注意的是，隨着@#&加入syns的時候，我沒有@#&在現在加入的字符串的開頭，而在我上面沒有使用列表理解的原始代碼中，我將@#&添加到加入字符串的開頭。結果是'professionals|pros'已經通過網絡滑落。但是我不知道如何在列表理解中取消這一點。

所以我的問題是「這可能與列表理解？」。

來源

2014-02-16 Darren Haynes

你爲什麼只在正則表達式的一邊使用'|'？如果它的邊緣條件呢？ – sln

那麼，如果我使用管道的前瞻斷言並且只匹配第一個字符/短語直到（但不包括）第一個管道，那麼它將工作得很好。匹配適用於第一個管道之前的所有內容，這是這裏的重要內容。我不想在列表中的每個元素的第一個管道之後匹配任何東西 –

我想你想要的東西，如：

non_match = [i for i in new_syns if not any(any(w == s.split("|")[0] 
               for w in i.split("|")) 
              for s in syns)]

這不使用正則表達式，但確實給結果

non_match == ['winter-time|winter|winter season']

名單中包括來自new_syns任何項目，其中沒有（not any）的'|'-分隔詞w分別在any的第一個詞（split("|")[0]）的每個同義詞組s從syns

來源

2014-02-16 23:29:54 jonrsharpe

我需要該匹配僅適用於'syns'中每個元素的管道的第一個單詞/短語。因此，對於syns的第一個索引，只有那個索引中的「專業人員」應該是匹配的，而不是「專家」，「專家」和「專業人員」。既然如此，如果我在'new_syns'列表的末尾添加''experts | specialists''，那麼它應該顯示在我的'non_match'列表中，因爲它不匹配'non_match'中的任何元素中的第一個單詞， syns' –

啊，我明白了;我做了一個適當的修改。 – jonrsharpe

完美 - 謝謝。現在我要研究嵌套的雙「任何」方法，直到它陷入:-) –

檢查部分匹配1列表與部分匹配另一個列表 - 可能與列表理解？

回答

相關問題