python使用nltk Sentiwordnet

我正在使用python NLTK對twitter數據進行情感分析。我需要一個包含+ ve和-ve極性的字典。我讀了很多關於sentiwordnet的東西，但是當我將它用於我的項目時，它並沒有給出有效和快速的結果。我想我沒有正確使用它。任何人都可以告訴我正確的方式來使用它？下面是我做了到現在爲止的步驟：python使用nltk Sentiwordnet

鳴叫
令牌的詞性標註
傳遞每個標籤sentinet

我使用NLTK包標記化和標記的標記化。見下面我的代碼的一部分：

import nltk 
from nltk.stem import * 
from nltk.corpus import sentiwordnet as swn 

tokens=nltk.word_tokenize(row) #for tokenization, row is line of a file in which tweets are saved. 
tagged=nltk.pos_tag(tokens) #for POSTagging 

for i in range(0,len(tagged)): 
    if 'NN' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'n'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).pos_score() #positive score of a word 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).neg_score() #negative score of a word 
    elif 'VB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'v'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).neg_score() 
    elif 'JJ' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'a'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).neg_score() 
    elif 'RB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'r'))>0: 
      pscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).pos_score() 
      nscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).neg_score()

最後我會計算有多少鳴叫是積極的，有多少鳴叫是否定的。我錯在哪裏？我應該如何使用它？是否還有其他類似的易於使用的字典？

來源

2015-11-27 jeny

我不完全明白你的問題是什麼。速度？ – b3000

沒有。我有大約4000條推文。通過使用sentiwordnet，我只能獲得10個正面和18個負面推文，這當然不是正確的結果。而課程速度也是一個問題，但主要問題是效率。編碼有沒有錯誤？ – jeny

sentiwordnet的覆蓋範圍小於您從推文中獲得的嘈雜輸入，您必須將真實推文中的單詞標準化爲適合sentiwordnet，例如'你們 - 你們'等等。 – alvas

是的，還有其他詞典可以使用。你可以在這裏找到一個詞庫的小列表：http://sentiment.christopherpotts.net/lexicons.html#resources 看來，劉兵的意見詞彙很容易使用。

除了鏈接到那些詞彙，該網站是一個非常好的情緒分析教程。

來源

2015-12-23 11:28:45 nestoralvaro

python使用nltk Sentiwordnet

回答

相關問題