查找表的字和替換它在Python 2.7

我有現成的如下兩個列表：查找表的字和替換它在Python 2.7

list_a = ['one','two','three','four','five','six','seven',...] 

list_content = ['This is 1st sentence with one.', 
'This is 2nd sentence with seven.', 
'This is 3rd sentence with one and two.', 
'This is 4th sentence with three, five, and six.',...]

的想法是每個句子從list_a找到一個詞list_content並將它們替換爲'__'以進行完全匹配。

輸出應該是這樣的：

list_output = ['This is 1st sentence with ___.', 
'This is 2nd sentence with ___.', 
'This is 3rd sentence with ___ and ___.', 
'This is 4th sentence with ___, ___, and ___.',...]

使用應用re.sub我嘗試：

for each_sent in list_content: 
    for word in list_a: 
    result = re.sub(r'\b' + word + r'\b', '__', each) 
    print result

它似乎並沒有被替換爲輸出。

來源

2017-02-28 htetmyet

變化'結果=應用re.sub（R '\ B' +字+ R '\ B'， '__'，每個）''到each_sent =重.sub（r'\ b'+ word + r'\ b'，'__'，each_sent）'和'print result' to'print each_sent' –

'result = re.sub（r'\ b'+ word + r'\ b'，'__'，each）'change' each' to'each_sent' –

你是一位功夫熊貓！你救了我的命！ :) – htetmyet

這應該工作：

import re 

list_a = ['one','two','three','four','five','six','seven',] 

list_content = ['This is 1st sentence with one.', 
'This is 2nd sentence with seven.', 
'This is 3rd sentence with one and two.', 
'This is 4th sentence with three, five, and six.',] 
list_output = [] 
for each_sent in list_content: 
    for word in list_a: 
     each_sent = re.sub(r'\b' + word + r'\b', '__', each_sent) 
    list_output.append(each_sent) 
print list_output

輸出：

['This is 1st sentence with __.', 'This is 2nd sentence with __.', 'This is 3rd sentence with __ and __.', 'This is 4th sentence with __, __, and __.']

來源

2017-02-28 06:19:26

它作爲一種魅力！ – htetmyet

@htetmyet很高興我可以幫助！一定要接受我的回答！ –

一個循環中避免循環。我在心中寫下此一致的性能

re_str_a = re.compile('\b' + '\b|\b'.join(list_a) + '\b') 
for each in list_content: 
    print re_str_a.sub('___', each)

來源

2017-02-28 06:26:59 Rami

您需要在編譯之前在每個單詞的前面和末尾添加'\ b'，以避免在'phone'中隱藏'one'。我會使用're.compile（'\ b'+'\ b | \ b'.join（list_a）+'\ b'）' –

感謝您的努力！ – htetmyet

非常感謝Alex。現在編輯我的答案。 – Rami

怎麼樣，沒有任何環路（https://regex101.com/r/pvwuUw/1）：

In [4]: sep = "||||" 

In [5]: re.sub(r'\b' + '|'.join(list_a) + r'\b', '__', sep.join(list_content)).split(sep) 
Out[5]: 
['This is 1st sentence with __.', 
'This is 2nd sentence with __.', 
'This is 3rd sentence with __ and __.', 
'This is 4th sentence with __, __, and __.']

的想法是加入list_content用分離器和置換後以相同的分隔分割字符串再次。

來源

2017-02-28 06:35:42 AKS

感謝您的解決方案。 – htetmyet

使用Python-textops包：

from textops import * 
print list_content >> sed('|'.join(list_a),'__')

來源

2017-02-28 08:36:37 Eric

查找表的字和替換它在Python 2.7

回答

相關問題