Python：刪除標籤中的所有字符與例外列表

因此，我想從標籤中刪除所有字符（主要是字母），但保留例外列表中的單詞。Python：刪除標籤中的所有字符與例外列表

例如，

我想改變

<html>VERY RARE CAR WITH NEW TIRES WHITE</html>

到：

<html>CAR WHITE</html>

這意味着兩個詞汽車和白色是從例外列表中。

來源

2014-01-14 Rimla

Sooo，你有什麼嘗試？ – Matt

你嘗試過什麼嗎？它工作嗎？ – 2014-01-14 08:27:13

'所有的字符（主要是字母）'???！？ – thefourtheye

我不確定這是你要找的。我會展示如何剝離任何文本，你想用2列表，例外詞和html標籤：

#This is to maintain the html tags unmodified 
html_tags = ['<a>','</a>','<html>','</html>'] 

#Exception words list 
word_list = ['WORD1','CAR','WORD2','WHITE','WORD3','WORD4'] 
#String you want to split 
string = '<html>VERY RARE CAR WITH NEW TIRES WHITE</html>' 

#The result string where we concatenate desired words and tags 
final_string = '' 

#now we change the string to add # before '<' and after '>' so we can split the text by tags 
string = string.replace('<','#<') 
string = string.replace('>','>#') 

string_list = string.split('#') #Now we have the tags unmodified (<html>,<a>...) 

#Now we have: 
#string_list = ['', '<html>', 'VERY RARE CAR WITH NEW TIRES WHITE', '</html>', ''] 

for word in string_list: #We go over all string_list 
    if (word in html_tags): #If we find a tag, we add it to final_string 
     final_string+=word 
    else: #If it isn't a tag, it is text, in this case 'VERY RARE CAR WITH NEW TIRES WHITE' 
     for word2 in word.split(): #We split by whitespace 
      if word2 in word_list: #If it is in word_list, we add it to final_string 
       final_string+=' '+word2+' ' 

#The result of this code is final_string with '<html> CAR WHITE </html>' 
#You can manage better the white spaces, and I make the code little complex 
#to make sure it works with different tags, and bigger html code.

希望它有幫助！

來源

2014-01-14 09:48:10 AlvaroAV

Python：刪除標籤中的所有字符與例外列表

回答

相關問題