我有一個python詞典,用於NLTK情感分析。擴展詞典以包含詞頻
注意:輸入的是純文本電子郵件內容。
def word_feats(words):
stopset = list(set(stopwords.words('english')))
words_split = words.split()
result = dict([(word, True) for word in words_split if word not in stopset])
return result
我想擴展它以包括字典中的單詞頻率以及獨特的單詞。
這是我目前得到:
'To' (4666843744) = {bool} True
'ensure' (4636385096) = {bool} True
'email' (4636383752) = {bool} True
'updates' (4636381960) = {bool} True
'delivered' (4667509936) = {bool} True
'inbox,' (4659135800) = {bool} True
'please' (4659137368) = {bool} True
'add' (4659135016) = {bool} True
我想類似下面的地方在年底的數字是頻率。它不必完全像這樣,但我希望能夠訪問每個單詞的頻率。
'To' (4666843744) = {bool} True, 100
'ensure' (4636385096) = {bool} True, 3
'email' (4636383752) = {bool} True, 40
'updates' (4636381960) = {bool} True, 3
'delivered' (4667509936) = {bool} True, 4
'inbox,' (4659135800) = {bool} True, 20
'please' (4659137368) = {bool} True, 150
'add' (4659135016) = {bool} True, 10
請提供您的'input'和所需'output' –
@KaushikNP謝謝你幫我提高我的問題。 –