>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
>>> words = ["happy", "sad", "angry", "jumpy"]
>>> words_in_phrases = [re.findall(r"\b[\w']+\b", phrase.lower()) for phrase in phrases]
>>> words_in_phrases
[["i'm", 'so', 'happy', 'with', 'myself', 'lately'], ['johnny', 'im', 'so', 'sad', 'so', 'very', 'sad', 'call', 'me'], ['i', 'feel', 'like', 'crap', 'so', 'angry']]
>>> word_counts = [{word: phrase.count(word) for word in words} for phrase in words_in_phrases]
>>> word_counts
[{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}, {'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}, {'jumpy': 0, 'angry': 1, 'sad': 0, 'happy': 0}]
>>>
對於線word_counts = [{word: phrase.count(word) for word in words} for...
,你需要的Python 2.7+。如果由於某種原因,您使用的是Python 2.7,請使用<替換該行:
>>> word_counts = [dict((word, phrase.count(word)) for word in words) for phrase in words_in_phrases]
這比您的其他問題要清楚一些。 – 2012-07-08 16:38:04
您不需要re或過濾器。內置的運營商'in'和str.'count'將有效地完成工作(按照該順序)。以下katrielalex和poke解決方案展示了兩種方法。當然,你也可以爲你做這項工作,但是會使用一把刀作爲加農炮的地方:-) – GeneralBecos 2012-07-08 16:41:55
@GeneralBecos:他可能需要一個正則表達式來將每個短語拆分成單詞。如果不是「我可能是美國人」中的「一個」,即使單詞「an」不在該短語中,也會返回true。 – 2012-07-08 16:44:07