擴展詞典以包含詞頻

我有一個python詞典，用於NLTK情感分析。擴展詞典以包含詞頻

注意：輸入的是純文本電子郵件內容。

def word_feats(words): 
    stopset = list(set(stopwords.words('english'))) 

    words_split = words.split() 

    result = dict([(word, True) for word in words_split if word not in stopset]) 

    return result

我想擴展它以包括字典中的單詞頻率以及獨特的單詞。

這是我目前得到：

'To' (4666843744) = {bool} True 
'ensure' (4636385096) = {bool} True 
'email' (4636383752) = {bool} True 
'updates' (4636381960) = {bool} True 
'delivered' (4667509936) = {bool} True 
'inbox,' (4659135800) = {bool} True 
'please' (4659137368) = {bool} True 
'add' (4659135016) = {bool} True

我想類似下面的地方在年底的數字是頻率。它不必完全像這樣，但我希望能夠訪問每個單詞的頻率。

'To' (4666843744) = {bool} True, 100 
'ensure' (4636385096) = {bool} True, 3 
'email' (4636383752) = {bool} True, 40 
'updates' (4636381960) = {bool} True, 3 
'delivered' (4667509936) = {bool} True, 4 
'inbox,' (4659135800) = {bool} True, 20 
'please' (4659137368) = {bool} True, 150 
'add' (4659135016) = {bool} True, 10

來源

2017-10-20 Prime By Design

請提供您的'input'和所需'output' –

@KaushikNP謝謝你幫我提高我的問題。 –

Python的Counter應該做的伎倆：

from collections import Counter 
result = dict(Counter(word for word in words_split if word not in stopset))

來源

2017-10-20 17:30:05 Mureinik

關閉，但現在我錯過了真正的布爾部分。我認爲這可能需要NLTK –

請注意，任何非零整數將在Python中評估爲True，所以這應該工作。 –

我現在試試 –

擴展詞典以包含詞頻

回答

相關問題