0
因此,我想從標籤中刪除所有字符(主要是字母),但保留例外列表中的單詞。Python:刪除標籤中的所有字符與例外列表
例如,
我想改變
<html>VERY RARE CAR WITH NEW TIRES WHITE</html>
到:
<html>CAR WHITE</html>
這意味着兩個詞汽車和白色是從例外列表中。
因此,我想從標籤中刪除所有字符(主要是字母),但保留例外列表中的單詞。Python:刪除標籤中的所有字符與例外列表
例如,
我想改變
<html>VERY RARE CAR WITH NEW TIRES WHITE</html>
到:
<html>CAR WHITE</html>
這意味着兩個詞汽車和白色是從例外列表中。
我不確定這是你要找的。我會展示如何剝離任何文本,你想用2列表,例外詞和html標籤:
#This is to maintain the html tags unmodified
html_tags = ['<a>','</a>','<html>','</html>']
#Exception words list
word_list = ['WORD1','CAR','WORD2','WHITE','WORD3','WORD4']
#String you want to split
string = '<html>VERY RARE CAR WITH NEW TIRES WHITE</html>'
#The result string where we concatenate desired words and tags
final_string = ''
#now we change the string to add # before '<' and after '>' so we can split the text by tags
string = string.replace('<','#<')
string = string.replace('>','>#')
string_list = string.split('#') #Now we have the tags unmodified (<html>,<a>...)
#Now we have:
#string_list = ['', '<html>', 'VERY RARE CAR WITH NEW TIRES WHITE', '</html>', '']
for word in string_list: #We go over all string_list
if (word in html_tags): #If we find a tag, we add it to final_string
final_string+=word
else: #If it isn't a tag, it is text, in this case 'VERY RARE CAR WITH NEW TIRES WHITE'
for word2 in word.split(): #We split by whitespace
if word2 in word_list: #If it is in word_list, we add it to final_string
final_string+=' '+word2+' '
#The result of this code is final_string with '<html> CAR WHITE </html>'
#You can manage better the white spaces, and I make the code little complex
#to make sure it works with different tags, and bigger html code.
希望它有幫助!
Sooo,你有什麼嘗試? – Matt
你嘗試過什麼嗎?它工作嗎? – 2014-01-14 08:27:13
'所有的字符(主要是字母)'???!? – thefourtheye