2017-02-28 21 views
0

我有現成的如下兩個列表:查找表的字和替換它在Python 2.7

list_a = ['one','two','three','four','five','six','seven',...] 

list_content = ['This is 1st sentence with one.', 
'This is 2nd sentence with seven.', 
'This is 3rd sentence with one and two.', 
'This is 4th sentence with three, five, and six.',...] 

的想法是每個句子從list_a找到一個詞list_content並將它們替換爲'__'以進行完全匹配。

輸出應該是這樣的:

list_output = ['This is 1st sentence with ___.', 
'This is 2nd sentence with ___.', 
'This is 3rd sentence with ___ and ___.', 
'This is 4th sentence with ___, ___, and ___.',...] 

使用應用re.sub我嘗試:

for each_sent in list_content: 
    for word in list_a: 
    result = re.sub(r'\b' + word + r'\b', '__', each) 
    print result 

它似乎並沒有被替換爲輸出。

+1

變化'結果=應用re.sub(R '\ B' +字+ R '\ B', '__',每個)''到each_sent =重.sub(r'\ b'+ word + r'\ b','__',each_sent)'和'print result' to'print each_sent' –

+0

'result = re.sub(r'\ b'+ word + r'\ b','__',each)'change' each' to'each_sent' –

+0

你是一位功夫熊貓!你救了我的命! :) – htetmyet

回答

3

這應該工作:

import re 

list_a = ['one','two','three','four','five','six','seven',] 

list_content = ['This is 1st sentence with one.', 
'This is 2nd sentence with seven.', 
'This is 3rd sentence with one and two.', 
'This is 4th sentence with three, five, and six.',] 
list_output = [] 
for each_sent in list_content: 
    for word in list_a: 
     each_sent = re.sub(r'\b' + word + r'\b', '__', each_sent) 
    list_output.append(each_sent) 
print list_output 

輸出:

['This is 1st sentence with __.', 'This is 2nd sentence with __.', 'This is 3rd sentence with __ and __.', 'This is 4th sentence with __, __, and __.'] 
+0

它作爲一種魅力! – htetmyet

+1

@htetmyet很高興我可以幫助!一定要接受我的回答! –

3

一個循環中避免循環。我在心中寫下此一致的性能

re_str_a = re.compile('\b' + '\b|\b'.join(list_a) + '\b') 
for each in list_content: 
    print re_str_a.sub('___', each) 
+0

您需要在編譯之前在每個單詞的前面和末尾添加'\ b',以避免在'phone'中隱藏'one'。我會使用're.compile('\ b'+'\ b | \ b'.join(list_a)+'\ b')' –

+0

感謝您的努力! – htetmyet

+0

非常感謝Alex。現在編輯我的答案。 – Rami

2

怎麼樣,沒有任何環路(https://regex101.com/r/pvwuUw/1):

In [4]: sep = "||||" 

In [5]: re.sub(r'\b' + '|'.join(list_a) + r'\b', '__', sep.join(list_content)).split(sep) 
Out[5]: 
['This is 1st sentence with __.', 
'This is 2nd sentence with __.', 
'This is 3rd sentence with __ and __.', 
'This is 4th sentence with __, __, and __.'] 

的想法是加入list_content用分離器和置換後以相同的分隔分割字符串再次。

+0

感謝您的解決方案。 – htetmyet

1

使用Python-textops包:

from textops import * 
print list_content >> sed('|'.join(list_a),'__')