這裏有一對夫婦,如果我是你的情況,我會考慮的選項。
您可以使用內置的any
和all
功能列表解析來過濾掉從列表中選擇不需要的網址:
urls = ['http://somewebsite.tld/word',
'http://somewebsite.tld/word1',
'http://somewebsite.tld/word1/stop3',
'http://somewebsite.tld/word2',
'http://somewebsite.tld/word2/stop2',
'http://somewebsite.tld/word3',
'http://somewebsite.tld/stop3/word1',
'http://somewebsite.tld/stop4/word1']
includes = ['word1', 'word2']
excludes = ['stop1', 'stop2', 'stop3']
filtered_url_list = [url for url in urls if any(include in url for include in includes) if all(exclude not in url for exclude in excludes)]
或者你可以做一個函數,它接受一個URL作爲參數,並返回那些True
你想保留的URL和False
你不這樣做,那麼傳遞函數的URL的未篩選列表一起內置的filter
功能:
def urlfilter(url):
includes = ['word1', 'word2']
excludes = ['stop1', 'stop2', 'stop3']
for include in includes:
if include in url:
for exclude in excludes:
if exclude in url:
return False
else:
return True
urls = ['http://somewebsite.tld/word',
'http://somewebsite.tld/word1',
'http://somewebsite.tld/word1/stop3',
'http://somewebsite.tld/word2',
'http://somewebsite.tld/word2/stop2',
'http://somewebsite.tld/word3',
'http://somewebsite.tld/stop3/word1',
'http://somewebsite.tld/stop4/word1']
filtered_url_list = filter(urlfilter, urls)
搜索列表理解 – Abend
會像'stop_words = [stop1,stop2,stop3]'和'key_words = [word1,word2]'那麼'在key_words中爲單詞:''如果在stop_words中有單詞:''#filter代碼「爲你工作? –
我想過濾其中的word1或word2的網址,但在網址中沒有任何停用詞。我試圖蠻力,這是多項式時間。喜歡的東西: '每個在網址:'' 如果字詞1或字詞2中的每個:'' 如果任何(X爲STOP_WORDS X)不是每個:'' 打印each.' – Joe