我想我會在這裏使用正則表達式:
import re
a=["Britney spears", "red dog", "\xa2xe3"]
regex = re.compile('|'.join(re.escape(x) for x in a))
b=["cat","dog","red dog is stupid", "good stuff \xa2xe3", "awesome Britney spears"]
b = [regex.sub("",x) for x in b ]
print (b) #['cat', 'dog', ' is stupid', 'good stuff ', 'awesome ']
這樣,正則表達式引擎可以優化測試替代品的清單。
這裏有一些替代方法來顯示不同的正則表達式如何表現。
import re
a = ["Britney spears", "red dog", "\xa2xe3"]
b = ["cat","dog",
"red dog is stupid",
"good stuff \xa2xe3",
"awesome Britney spears",
"transferred dogcatcher"]
#This version leaves whitespace and will match between words.
regex = re.compile('|'.join(re.escape(x) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', ' is stupid', 'good stuff ', 'awesome ', 'transfercatcher']
#This version strips whitespace from either end
# of the returned string
regex = re.compile('|'.join(r'\s*{}\s*'.format(re.escape(x)) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff', 'awesome', 'transfercatcher']
#This version will only match at word boundaries,
# but you lose the match with \xa2xe3 since it isn't a word
regex = re.compile('|'.join(r'\s*\b{}\b\s*'.format(re.escape(x)) for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff \xa2xe3', 'awesome', 'transferred dogcatcher']
#This version finally seems to get it right. It matches whitespace (or the start
# of the string) and then the "word" and then more whitespace (or the end of the
# string). It then replaces that match with nothing -- i.e. it removes the match
# from the string.
regex = re.compile('|'.join(r'(?:\s+|^)'+re.escape(x)+r'(?:\s+|$)' for x in a))
c = [regex.sub("",x) for x in b ]
print (c) #['cat', 'dog', 'is stupid', 'good stuff', 'awesome', 'transferred dogcatcher']
請注意前導和尾隨空格。他希望在某些情況下(也許是所有情況下)修剪它。如果子字符串從'b'元素的中間切掉,他可能不需要額外的空格。 –
@ sr2222 - 也許。這就像在'regex.sub'的末尾添加'.strip()一樣簡單,或者允許正則表達式匹配它們周圍的空白 - ''|'.join(r'\ s * {} \ s *」。格式(重。escape(x))for x in a' – mgilson
也許增加一些斷字保護?否則「紅色狗」也會與「轉移的狗狗」相匹配。 – DSM