您可以加入黑名單到一個表達:
import re
blacklist = re.compile('|'.join([re.escape(word) for word in B]))
然後過濾的話,如果他們匹配:
C = [word for word in A if not blacklist.search(word)]
模式中的單詞a再次逃脫(這樣.
和其他元字符不作爲,而不是這樣的,而是作爲文字字符處理),並加入到一系列|
選擇:
>>> '|'.join([re.escape(word) for word in B])
'XXX|BBB'
演示:
>>> import re
>>> A = [ 'cat', 'doXXXg', 'monkey', 'hoBBBrse', 'fish', 'snake']
>>> B = ['XXX', 'BBB']
>>> blacklist = re.compile('|'.join([re.escape(word) for word in B]))
>>> [word for word in A if not blacklist.search(word)]
['cat', 'monkey', 'fish', 'snake']
這應該勝過任何明確的會員測試,尤其是詞在你的黑名單數量的增長:
>>> import string, random, timeit
>>> def regex_filter(words, blacklist):
... [word for word in A if not blacklist.search(word)]
...
>>> def any_filter(words, blacklist):
... [word for word in A if not any(bad in word for bad in B)]
...
>>> words = [''.join([random.choice(string.letters) for _ in range(random.randint(3, 20))])
... for _ in range(1000)]
>>> blacklist = [''.join([random.choice(string.letters) for _ in range(random.randint(2, 5))])
... for _ in range(10)]
>>> timeit.timeit('any_filter(words, blacklist)', 'from __main__ import any_filter, words, blacklist', number=100000)
0.36232495307922363
>>> timeit.timeit('regex_filter(words, blacklist)', "from __main__ import re, regex_filter, words, blacklist; blacklist = re.compile('|'.join([re.escape(word) for word in blacklist]))", number=100000)
0.2499098777770996
上述測試10個隨機黑色列出短的單詞(2-5個字符)和1000個隨機單詞列表(長度爲3 - 20個字符),正則表達式快了大約50%。
爲什麼使用正則表達式?參見[this](http://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-method)。 – ThaMe90 2014-11-14 14:53:27
我非常好奇那些爲這個問題放了'Downvote'的人!!!!!!!!!!!!!!!!!!!! + 1 – vks 2014-11-14 15:50:52