我指的是previously asked question:我想對德語推文進行情感分析,並使用下面的代碼從我提到的stackoverflow線程中。但是,我想做一個分析,得到實際的情緒分數作爲結果,而不僅僅是TRUE/FALSE的總和,無論是正面還是負面。任何想法,一個簡單的方法來做到這一點?Twitter Sentiment Analysis w R使用德語語言Set SentiWS3得分
您還可以在previous thread中找到單詞列表。
library(plyr)
library(stringr)
readAndflattenSentiWS <- function(filename) {
words = readLines(filename, encoding="UTF-8")
words <- sub("\\|[A-Z]+\t[0-9.-]+\t?", ",", words)
words <- unlist(strsplit(words, ","))
words <- tolower(words)
return(words)
}
pos.words <- c(scan("Post3/positive-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("Post3/SentiWS_v1.8c_Positive.txt"))
neg.words <- c(scan("Post3/negative-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("Post3/SentiWS_v1.8c_Negative.txt"))
score.sentiment = function(sentences, pos.words, neg.words, .progress='none') {
require(plyr)
require(stringr)
scores = laply(sentences, function(sentence, pos.words, neg.words)
{
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# match() returns the position of the matched term or NA
# I don't just want a TRUE/FALSE! How can I do this?
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum(pos.matches) - sum(neg.matches)
return(score)
},
pos.words, neg.words, .progress=.progress)
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
sample <- c("ich liebe dich. du bist wunderbar",
"Ich hasse dich, geh sterben!",
"i love you. you are wonderful.",
"i hate you, die.")
(test.sample <- score.sentiment(sample,
pos.words,
neg.words))
您的代碼是否正常運行?我猜'laply'應該是'lapply',但是你引用的帖子也寫道... –
是的,它運行並且工作。我實際上已經嘗試過將它變成輕快樂隊,然後它再也不能工作了。我對這些功能還是比較陌生,所以我不知道爲什麼...... – juliasb
啊,'laply'是plyr的一部分!很高興我沒有編輯「修復」,現在:-) –