兩個列表，文字之一，短語

好之一，所以我有兩份名單，字中的一個，像這樣：兩個列表，文字之一，短語

["happy", "sad", "angry", "jumpy"]

等

然後短語的列表，像這樣：

["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]

我想用文字的第一個列表，找到詞的發生率在短語的列表。我不在乎是否拉出實際的單詞，用空格分開，或者僅僅是發生的次數。

從我看到的，看來，重新模塊以及過濾器是要走的路？

另外，如果我對我需要的解釋不清楚，請告訴我。

來源

2012-07-08 Andrew Alexander

這比您的其他問題要清楚一些。 – 2012-07-08 16:38:04

您不需要re或過濾器。內置的運營商'in'和str.'count'將有效地完成工作（按照該順序）。以下katrielalex和poke解決方案展示了兩種方法。當然，你也可以爲你做這項工作，但是會使用一把刀作爲加農炮的地方:-) – GeneralBecos 2012-07-08 16:41:55

@GeneralBecos：他可能需要一個正則表達式來將每個短語拆分成單詞。如果不是「我可能是美國人」中的「一個」，即使單詞「an」不在該短語中，也會返回true。 – 2012-07-08 16:44:07

>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"] 
>>> words = ["happy", "sad", "angry", "jumpy"] 
>>> 
>>> for phrase in phrases: 
...  print phrase 
...  print {word: phrase.count(word) for word in words} 
... 
I'm so happy with myself lately! 
{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1} 
Johnny, im so sad, so very sad, call me 
{'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0} 
i feel like crap. SO ANGRY!!!! 
{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 0}

來源

2012-07-08 16:33:34 katrielalex

很簡單，直接的解決方案：

>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"] 
>>> words = ["happy", "sad", "angry", "jumpy"] 
>>> for phrase in phrases: 
     for word in words: 
      if word in phrase: 
       print('"{0}" is in the phrase "{1}".'.format(word, phrase)) 

"happy" is in the phrase "I'm so happy with myself lately!". 
"sad" is in the phrase "Johnny, im so sad, so very sad, call me".

來源

2012-07-08 16:25:02 poke

是的，但我想計算實例。所以在約翰尼的身份，我需要記錄多個。另外，我也可以爲此插入一個正則表達式，對吧？ – 2012-07-08 16:30:47

您可以輕鬆地將打印改爲任何想用匹配進行的打印。根據你的問題，你「不在乎」如何處理他們，所以下一次你應該更具體。 – poke 2012-07-08 16:53:06

爲什麼downvote？ – poke 2012-07-08 17:19:42

>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"] 
>>> words = ["happy", "sad", "angry", "jumpy"] 
>>> words_in_phrases = [re.findall(r"\b[\w']+\b", phrase.lower()) for phrase in phrases] 
>>> words_in_phrases 
[["i'm", 'so', 'happy', 'with', 'myself', 'lately'], ['johnny', 'im', 'so', 'sad', 'so', 'very', 'sad', 'call', 'me'], ['i', 'feel', 'like', 'crap', 'so', 'angry']] 
>>> word_counts = [{word: phrase.count(word) for word in words} for phrase in words_in_phrases] 
>>> word_counts 
[{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}, {'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}, {'jumpy': 0, 'angry': 1, 'sad': 0, 'happy': 0}] 
>>>

對於線word_counts = [{word: phrase.count(word) for word in words} for...，你需要的Python 2.7+。如果由於某種原因，您使用的是Python 2.7，請使用<替換該行：

>>> word_counts = [dict((word, phrase.count(word)) for word in words) for phrase in words_in_phrases]

來源

2012-07-08 16:50:28

兩個列表，文字之一，短語

回答

相關問題