R詞典在詞典中的情感分析

我對一組推特進行了情感分析，我現在想知道如何將詞組添加到正面和負面詞典。R詞典在詞典中的情感分析

我已經閱讀過我想測試的短語文件，但是在運行情感分析時，它並沒有給我一個結果。

在通過情感算法進行閱讀時，我可以看到它將單詞與詞典匹配，但有沒有方法可以掃描單詞以及短語？

下面是代碼：

score.sentiment = function(sentences, pos.words, neg.words, .progress='none') 
{ 
    require(plyr) 
    require(stringr) 
    # we got a vector of sentences. plyr will handle a list 
    # or a vector as an "l" for us 
    # we want a simple array ("a") of scores back, so we use 
    # "l" + "a" + "ply" = "laply": 
    scores = laply(sentences, function(sentence, pos.words, neg.words) { 
    # clean up sentences with R's regex-driven global substitute, gsub(): 
    sentence = gsub('[[:punct:]]', '', sentence) 
    sentence = gsub('[[:cntrl:]]', '', sentence) 
    sentence = gsub('\\d+', '', sentence)  
    # and convert to lower case:  
    sentence = tolower(sentence)  
    # split into words. str_split is in the stringr package  
    word.list = str_split(sentence, '\\s+')  
    # sometimes a list() is one level of hierarchy too much  
    words = unlist(word.list)  
    # compare our words to the dictionaries of positive & negative terms 
    pos.matches = match(words, pos) 
    neg.matches = match(words, neg) 
    # match() returns the position of the matched term or NA  
    # we just want a TRUE/FALSE:  
    pos.matches = !is.na(pos.matches) 
    neg.matches = !is.na(neg.matches) 
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum(): 
    score = sum(pos.matches) - sum(neg.matches)  
    return(score)  
    }, pos.words, neg.words, .progress=.progress) 
    scores.df = data.frame(score=scores, text=sentences) 
    return(scores.df) 
} 
analysis=score.sentiment(Tweets, pos, neg) 
table(analysis$score)

這是結果我得到：

0 
20

，而我的標準表後，該功能提供例如

-2 -1 0 1 2 
1 2 3 4 5

例如。

有沒有人有關於如何在短語上運行此任何想法？注意：TWEETS文件是一個句子文件。

來源

2015-09-04 L. Natalka

不知道，但我想你可能意味着lapply而不是laply？ – dd3

@ dd3它是從plyr包裹中重疊的，而不是從基地的lapply。 – WhiteViking

我是R的初學者。你在這裏做什麼「進展」？好像你沒有在你的功能中使用它？ – alwaysaskingquestions

功能score.sentiment似乎工作。如果我嘗試一個非常簡單的設置，

Tweets = c("this is good", "how bad it is") 
neg = c("bad") 
pos = c("good") 
analysis=score.sentiment(Tweets, pos, neg) 
table(analysis$score)

我得到預期的結果，

> table(analysis$score) 

-1 1 
1 1

你是如何餵養20個鳴叫的方法？根據你發佈的結果，那0 20，我想說你的問題是你的20條推文沒有任何正面或負面的詞，儘管當然這是你會注意到的。也許如果你在你的推文列表上發佈更多的細節，你的正面和負面的話會更容易幫助你。

無論如何，你的功能似乎工作得很好。

希望它有幫助。

通過評論澄清後編輯：

其實，解決你的問題，你需要你的句子記號化到n-grams，其中n將對應於您正在使用您的肯定列表和文字的最大數量負數n-grams。你可以看到如何做到這一點，例如在this SO question。爲了完整性，並且由於我自己測試了它，下面是您可以做的一個示例。我它簡化到bigrams（N = 2），並使用以下輸入：

Tweets = c("rewarding hard work with raising taxes and VAT. #LabourManifesto", 
       "Ed Miliband is offering 'wrong choice' of 'more cuts' in #LabourManifesto") 
pos = c("rewarding hard work") 
neg = c("wrong choice")

可以創建一個兩字組標記生成器像這樣，

library(tm) 
library(RWeka) 
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=2,max=2))

並對其進行測試，

> BigramTokenizer("rewarding hard work with raising taxes and VAT. #LabourManifesto") 
[1] "rewarding hard"  "hard work"   "work with"   
[4] "with raising"   "raising taxes"  "taxes and"   
[7] "and VAT"    "VAT #LabourManifesto"

然後在你的方法中，你簡單地用這條線代替，

word.list = str_split(sentence, '\\s+')

本

word.list = BigramTokenizer(sentence)

雖然當然，如果你改變word.list到ngram.list或類似的東西，它會更好。

結果是，正如預期，

> table(analysis$score) 

-1 0 
1 1

只是決定你n-gram大小並將其添加到Weka_control，你應該罰款。

希望它有幫助。

來源

2015-09-06 17:37:54 lrnzcig

@Irnczig。我設法讓score.sentiment與我的正面和負面詞典一起工作，但是如果我想補充一下，例如，對詞典來說「好」和「有多糟糕」，而不僅僅是「壞」和「好」「你會知道如何工作嗎？ –

例如，以下推文：[[[「提高稅收和增值稅的努力工作#LabourManifesto」，「Ed Miliband在#LabourManifesto中提供'更多削減'的'錯誤選擇'。]]]字典，我想爲積極的「獎勵辛勤工作」，「提高稅收」，「更多削減」負面。我運行情緒，它分裂了這些短語。 –

好的，理解。讓我看一看。 – lrnzcig

R詞典在詞典中的情感分析

回答

相關問題