如何從字符串列表中刪除單詞列表

對不起，如果問題有點混亂。這與this question 如何從字符串列表中刪除單詞列表

我認爲這個問題與我想要的接近，但在Clojure中。

我需要這樣的事，但不是「[BR]」在這個問題，有需要進行搜索和刪除字符串列表。

希望我說清楚了。

我認爲這是由於Python中的字符串是不可變的。

我有一個需要從字符串列表中刪除的噪音詞彙列表。

如果我使用列表理解，我最終一次又一次地搜索相同的字符串。所以，只有「的」被刪除，而不是「該」。所以我的修改列表看起來像這樣

places = ['New York', 'the New York City', 'at Moscow' and many more] 

noise_words_list = ['of', 'the', 'in', 'for', 'at'] 

for place in places: 
    stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

我想知道我在做什麼錯誤。

來源

2010-08-18 prabhu

什麼是'place'？ – katrielalex 2010-08-18 09:58:27

你沒有讓自己清楚;如果您認爲以下必要，請在此處陳述您的問題*，然後將類似問題的鏈接放在相似的答案中。 – 2010-08-18 10:36:15

這是我的刺傷。這使用正則表達式。

import re 
pattern = re.compile("(of|the|in|for|at)\W", re.I) 
phrases = ['of New York', 'of the New York'] 
map(lambda phrase: pattern.sub("", phrase), phrases) # ['New York', 'New York']

三世lambda：

[pattern.sub("", phrase) for phrase in phrases]

更新

修復了該bug所指出的gnibbler（謝謝！）：

pattern = re.compile("\\b(of|the|in|for|at)\\W", re.I) 
phrases = ['of New York', 'of the New York', 'Spain has rain'] 
[pattern.sub("", phrase) for phrase in phrases] # ['New York', 'New York', 'Spain has rain']

@prabhu：上述變化避免刪除尾隨的「in「from」Spain「。要驗證兩個版本的正則表達式是否符合「西班牙有雨」這個短語。

來源

2010-08-18 09:58:58

謝謝。它以這種方式工作。現在我有機會實現這一點，我能夠更清楚地理解lambda的概念。 – prabhu 2010-08-18 10:17:29

對於「西班牙有雨」這個短語，這不起作用。這很容易修復，雖然 – 2010-08-18 10:29:23

@Gnibbler：謝謝你指出。我相應地改變了我的答案。 – 2010-08-18 10:47:18

>>> import re 
>>> noise_words_list = ['of', 'the', 'in', 'for', 'at'] 
>>> phrases = ['of New York', 'of the New York'] 
>>> noise_re = re.compile('\\b(%s)\\W'%('|'.join(map(re.escape,noise_words_list))),re.I) 
>>> [noise_re.sub('',p) for p in phrases] 
['New York', 'New York']

來源

2010-08-18 10:04:41

哇！儘管我精神緊張，但這是一種非常酷的做法。 :-) – prabhu 2010-08-18 10:21:30

這似乎沒有得到任何單詞的實例。例如，「紐約的」成爲「紐約的」。 – Namey 2014-05-05 00:38:41

@Namey，你可以使用類似''\\ W？\\ b（％s）\\ W？''的東西。如果沒有OP提供了一套全面的測試用例，那麼這是一個尷尬的問題 – 2014-05-05 01:12:32

既然你想知道你在做什麼錯，這條線：

stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

發生，然後開始遍歷的話。首先它檢查「的」。你的位置（例如「紐約的」）被檢查以查看它是否以「of」開頭。它被轉換（調用替換和剝離）並添加到結果列表中。這裏至關重要的是結果不再被檢查。對於在理解中迭代的每個單詞，都會將新結果添加到結果列表中。所以下一個單詞是「the」，你的位置（「紐約」）不是以「the」開始，所以不會添加新的結果。

我假設你最終得到的結果是你的地點變量的連接。一個簡單的閱讀和理解程序的版本將是（未經測試）：

results = [] 
for place in places: 
    for word in words: 
     if place.startswith(word): 
      place = place.replace(word, "").strip() 
    results.append(place)

記住replace()將隨時隨地刪除字符串中的單詞，即使它發生是由於一個簡單的字符串。你可以通過使用類似於^the\b這樣的模式的正則表達式來避免這種情況。

來源

2010-08-18 10:13:00 wds

謝謝。這非常有幫助。 – prabhu 2010-08-18 10:16:18

沒有正則表達式，你可以這樣做：

places = ['of New York', 'of the New York'] 

noise_words_set = {'of', 'the', 'at', 'for', 'in'} 
stuff = [' '.join(w for w in place.split() if w.lower() not in noise_words_set) 
     for place in places 
     ] 
print stuff

來源

2010-08-18 11:25:18

優秀！謝謝！ – prabhu 2010-08-19 11:47:51

我碰到過這個，不知道這裏發生了什麼。如果有人絆倒這一點，並想知道發生了什麼魔術，它的被稱爲列表理解，這是一個很好的文章解釋它http://carlgroner.me/Python/2011/11/09/An-Introduction-to-List-Comprehensions-在-Python.html – 2017-07-26 10:53:33

如何從字符串列表中刪除單詞列表

回答

相關問題