從字符串中刪除常用單詞？

這是我有：

import re 
ask = "What's the weather like in Lexington, SC?" 
REMOVE_LIST = ["like", "in", "how's", "hows", "weather", "the", "whats", "what's", "?"] 
remove = '|'.join(REMOVE_LIST) 
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE) 
out = regex.sub("", ask)

它輸出：

nothing to repeat

來源

2014-01-20 addm3plz

[x for x in ask.split() if x.lower() not in REMOVE_LIST]

來源

2014-01-20 06:38:31

你應該逃避字符串字面匹配，因爲一些字符有特殊意義的正則表達式（例如?在REMOVE_LIST）：

使用re.escape逃脫這樣的字符：

>>> import re 
>>> re.escape('?') 
'\\?' 

>>> re.search('?', 'Lexington?') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "C:\Python27\lib\re.py", line 142, in search 
    return _compile(pattern, flags).search(string) 
    File "C:\Python27\lib\re.py", line 242, in _compile 
    raise error, v # invalid expression 
sre_constants.error: nothing to repeat 
>>> re.search(r'\?', 'Lexington?') 
<_sre.SRE_Match object at 0x0000000002C68100> 
>>>

>>> import re 
>>> ask = "What's the weather like in Lexington, SC?" 
>>> REMOVE_LIST = ["like", "in", "how's", "hows", "weather", "the", "whats", "what's", "?"] 
>>> remove = '|'.join(map(re.escape, REMOVE_LIST)) 
>>> regex = re.compile(r'\b(' + remove + r')\b', flags=re.IGNORECASE) 
>>> out = regex.sub("", ask) 
>>> print out 
    Lexington, SC?

來源

2014-01-20 06:40:05 falsetru

使用正則表達式來找到的話：

import re 

sentence = "What's the weather like in Lexington, SC?" 
words = re.findall(r"[\w']+", sentence.lower()) 
remove = {"like", "in", "how's", "hows", "weather", "the", "whats", "what's", "?"} 

print set(words) - remove

集合是無序的，因此，如果順序很重要，你可以過濾列表具有列表理解的詞語：

[word for word in words if word not in remove]

來源

2014-01-20 06:41:25 Blender

從字符串中刪除常用單詞？

回答

相關問題