2013-03-27 98 views
6

我有一個用戶輸入的字符串,我想搜索它並用替換字符串替換任何出現的單詞列表。用python中的另一個字符串替換單詞列表中的所有單詞

import re 

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 


# word[1] contains the user entered message 
themessage = str(word[1])  
# would like to implement a foreach loop here but not sure how to do it in python 
for themessage in prohibitedwords: 
    themessage = re.sub(prohibitedWords, "(I'm an idiot)", themessage) 

print themessage 

上面的代碼不起作用,我敢肯定我不明白python for循環是如何工作的。

+0

你應該嘗試檢查出的蟒蛇spambayes實現可能更具可擴展性。 – dusual 2013-03-27 12:18:01

回答

11

你可以做到這一點與一個調用sub

big_regex = re.compile('|'.join(map(re.escape, prohibitedWords))) 
the_message = big_regex.sub("repl-string", str(word[1])) 

例子:

>>> import re 
>>> prohibitedWords = ['Some', 'Random', 'Words'] 
>>> big_regex = re.compile('|'.join(map(re.escape, prohibitedWords))) 
>>> the_message = big_regex.sub("<replaced>", 'this message contains Some really Random Words') 
>>> the_message 
'this message contains <replaced> really <replaced> <replaced>' 

注意,使用str.replace可能導致微妙的錯誤:

>>> words = ['random', 'words'] 
>>> text = 'a sample message with random words' 
>>> for word in words: 
...  text = text.replace(word, 'swords') 
... 
>>> text 
'a sample message with sswords swords' 

同時使用re.sub給出正確的結果:

>>> big_regex = re.compile('|'.join(map(re.escape, words))) 
>>> big_regex.sub("swords", 'a sample message with random words') 
'a sample message with swords swords' 

由於thg435指出,如果要更換不是每個子串,你可以添加單詞邊界的正則表達式:

big_regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words))) 

這會取代'random''random words'而不是'pseudorandom words'

+0

你可以顯示一個運行 – 2013-03-27 12:03:51

+0

但是,如果你有很多詞要替換,你將不得不打破它。 – DSM 2013-03-27 12:15:18

+0

您可能希望將您的表達式放在'\ b'中以避免替換「零售商」中的「tail」。 – georg 2013-03-27 12:31:30

4

試試這個:

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 

themessage = str(word[1])  
for word in prohibitedwords: 
    themessage = themessage.replace(word, "(I'm an idiot)") 

print themessage 
+0

這很脆弱:正如Bakuriu解釋的,當一個被禁止的單詞是另一個的子串時,它很容易中斷。 – Adam 2013-03-27 12:19:51

+0

@codesparkle這並不意味着這是錯誤的,你總是選擇你的選擇取決於某些條件 – 2013-03-27 12:25:48

0

代碼:

prohibitedWords =["MVGame","Kappa","DatSheffy","DansGame", 
        "BrainSlug","SwiftRage","Kreygasm", 
        "ArsonNoSexy","GingerPower","Poooound","TooSpicy"] 
themessage = 'Brain' 
self_criticism = '(I`m an idiot)' 
final_message = [i.replace(themessage, self_criticism) for i in prohibitedWords] 
print final_message 

結果:

['MVGame', 'Kappa', 'DatSheffy', 'DansGame', '(I`m an idiot)Slug', 'SwiftRage', 
'Kreygasm', 'ArsonNoSexy', 'GingerPower', 'Poooound','TooSpicy'] 
相關問題