你可以看看把你的名字列表變成一個正則表達式。舉個例子名稱這個小名單:
names = ['AARON',
'ABDUL',
'ABE',
'ABEL',
'ABRAHAM',
'ABRAM',
'ADALBERTO',
'ADAM',
'ADAN',
'ADOLFO',
'ADOLPH',
'ADRIAN',
]
這可能與以下正則表達式來表示:
\b(?:AARON|ABDUL|ABE|ABEL|ABRAHAM|ABRAM|ADALBERTO|ADAM|ADAN|ADOLFO|ADOLPH|ADRIAN)\b
但是這不會是非常有效的。這是建立像樹的正則表達式將更好地工作:
\b(?:A(?:B(?:E(?:|L)|RA(?:M|HAM)|DUL)|D(?:A(?:M|N|LBERTO)|OL(?:FO|PH)|RIAN)|ARON))\b
然後,您可以自動化生產這個正則表達式的 - 首先從名稱列表創建dict
- 樹結構可能和然後將該樹翻譯成正則表達式。對於上面的例子,這中間的樹應該是這樣的:
{
'A': {
'A': {
'R': {
'O': {
'N': {
'': {}
}
}
}
},
'B': {
'D': {
'U': {
'L': {
'': {}
}
}
},
'E': {
'': {},
'L': {
'': {}
}
},
... etc
......這能選擇性地簡化爲這樣:
{
'A': {
'ARON': {
'': {}
}
'B': {
'DUL': {
'': {}
},
'E': {
'': {},
'L': {
'': {}
}
},
'RA': {
'HAM': {
'': {}
},
'M': {
'': {}
}
}
},
... etc
這是建議的代碼來做到這一點:
import re
def addToTree(tree, name):
if len(name) == 0:
return
if name[0] in tree.keys():
addToTree(tree[name[0]], name[1:])
else:
for letter in name:
tree[letter] = {}
tree = tree[letter]
tree[''] = {}
# Optional improvement of the tree: it combines several consecutive letters into
# one key if there are no alternatives possible
def simplifyTree(tree):
repeat = True
while repeat:
repeat = False
for key, subtree in list(tree.items()):
if key != '' and len(subtree) == 1 and '' not in subtree.keys():
for letter, subsubtree in subtree.items():
tree[key + letter] = subsubtree
del tree[key]
repeat = True
for key, subtree in tree.items():
if key != '':
simplifyTree(subtree)
def treeToRegExp(tree):
regexp = [re.escape(key) + treeToRegExp(subtree) for key, subtree in tree.items()]
regexp = '|'.join(regexp)
return '' if regexp == '' else '(?:' + regexp + ')'
def listToRegExp(names):
tree = {}
for name in names:
addToTree(tree, name[:])
simplifyTree(tree)
return re.compile(r'\b' + treeToRegExp(tree) + r'\b', re.I)
# Demo
names = ['AARON',
'ABDUL',
'ABE',
'ABEL',
'ABRAHAM',
'ABRAM',
'ADALBERTO',
'ADAM',
'ADAN',
'ADOLFO',
'ADOLPH',
'ADRIAN',
]
fields = [
'This is Aaron speaking',
'Is Abex a name?',
'Where did Abraham get the mustard from?'
]
regexp = listToRegExp(names)
# get the search result for each field, and link it with the index of the field
results = [[i, regexp.search(field)] for i, field in enumerate(fields)]
# remove non-matches from the results
results = [[i, match.group(0)] for [i, match] in results if match]
# print results
print(results)
看到它在repl.it
可能的重複[Python - 最快的方法來檢查一個字符串是否包含列表中的任何項目中的特定字符](https://stackoverflow.com/questions/14411633/python-fastest-way-to-check -if-a-string-contains-specific-characters-in-any) – Shubham