2015-12-26 21 views
1

我想對已經獲取並存儲在MongoDb中的鳴叫進行情緒分析。獲取這是數據幀格式的微博後,我收到以下錯誤:錯誤插入/檢索鳴叫到蒙古文分貝

iphone.tweets <- searchTwitter('#iphone', n=15, lang="en") 
iphone.text=laply(iphone.tweets,function(t) t$getText()) 
df_ip <- as.data.frame(iphone.text) 

m <- mongo("iphonecollection",db="project") 
m$insert(df_ip) 
df_ip<-m$find() 
ip.lst<-as.list(t(df_ip)) 
ip.txt=laply(ip.lst,function(t) t$getText()) 

我希望做的是計算景氣指數爲:

ip.txt=laply(ip.lst,function(t) t$getText()) 
Error in t$getText : $ operator is invalid for atomic vectors 

整個代碼如下如下:

iphone.scores <- score.sentiment(ip.txt, pos.words,neg.words, .progress='text') 

score.sentiment例程如下:

score.sentiment = function(sentences, pos.words, neg.words, .progress='none') 
{ 
    require(plyr) 
    require(stringr) 
    # we got a vector of sentences. plyr will handle a list or a vector as an "l" for us 
    # we want a simple array of scores back, so we use "l" + "a" + "ply" = laply: 
    scores = laply(sentences, function(sentence, pos.words, neg.words) { 
    # clean up sentences with R's regex-driven global substitute, gsub(): 
    sentence = gsub('[[:punct:]]', '', sentence) 
    sentence = gsub('[[:cntrl:]]', '', sentence) 
    sentence = gsub('\\d+', '', sentence) 
    # and convert to lower case: 
    sentence = tolower(sentence) 
    # split into words. str_split is in the stringr package 
    word.list = str_split(sentence, '\\s+') 
    # sometimes a list() is one level of hierarchy too much 
    words = unlist(word.list) 
    # compare our words to the dictionaries of positive & negative terms 
    pos.matches = match(words, pos.words) 
    neg.matches = match(words, neg.words) 
    # match() returns the position of the matched term or NA 
    # we just want a TRUE/FALSE: 
    pos.matches = !is.na(pos.matches) 
    neg.matches = !is.na(neg.matches) 
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum(): 
    score = sum(pos.matches) - sum(neg.matches) 
    return(score) 
    }, pos.words, neg.words, .progress=.progress) 
    scores.df = data.frame(score=scores, text=sentences) 
    return(scores.df) 
} 
+0

有幾件事情。你的'score.sentiment'例程來自哪裏? mongo db有什麼意義?爲什麼你不能直接將'ip.lst'放入'score.sentiment'例程中? –

+0

而不是一直提取推文,我打算將它們一次存儲到Mongodb中,然後從那裏取回並處理推文。 – VBB

回答

1

我想你想要使用sapply,它會將searchTwitter返回的狀態對象列表弄平。無論如何,這是有效的。請注意,您需要爲此安裝,然後開始MongoDB工作:

library(twitteR) 
library(plyr) 
library(stringr) 
library(mongolite) 

# you have to set up a Twitter Application at https://dev.twitter.com/ to get these 
# 
ntoget <- 600 # get 600 tweets 

iphone.tweets <- searchTwitter('#iphone', n=ntoget, lang="en") 
iphone.text <- sapply(iphone.tweets,function(t) t$getText()) 
df_ip <- as.data.frame(iphone.text) 

# MongoDB must be installed and the service started (mongod.exe in Windows) 
# 
m <- mongo("iphonecollection",db="project") 
m$insert(df_ip) 
df_ip_out<-m$find() 

# Following routine (score.sentiment) was copied from: 
# http://stackoverflow.com/questions/32395098/r-sentiment-analysis-with-phrases-in-dictionaries 
# 
score.sentiment = function(sentences, pos.words, neg.words, .progress='none') 
{ 
    require(plyr) 
    require(stringr) 
    # we got a vector of sentences. plyr will handle a list 
    # or a vector as an "l" for us 
    # we want a simple array ("a") of scores back, so we use 
    # "l" + "a" + "ply" = "laply": 
    scores = laply(sentences, function(sentence, pos.words, neg.words) { 
    # clean up sentences with R's regex-driven global substitute, gsub(): 
    sentence = gsub('[[:punct:]]', '', sentence) 
    sentence = gsub('[[:cntrl:]]', '', sentence) 
    sentence = gsub('\\d+', '', sentence)  
    # and convert to lower case:  
    sentence = tolower(sentence)  
    # split into words. str_split is in the stringr package  
    word.list = str_split(sentence, '\\s+')  
    # sometimes a list() is one level of hierarchy too much  
    words = unlist(word.list)  
    # compare our words to the dictionaries of positive & negative terms 
    pos.matches = match(words, pos) 
    neg.matches = match(words, neg) 
    # match() returns the position of the matched term or NA  
    # we just want a TRUE/FALSE:  
    pos.matches = !is.na(pos.matches) 
    neg.matches = !is.na(neg.matches) 
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum(): 
    score = sum(pos.matches) - sum(neg.matches)  
    return(score)  
    }, pos.words, neg.words, .progress=.progress) 
    scores.df = data.frame(score=scores, text=sentences) 
    return(scores.df) 
} 

tweets <- as.character(df_ip_out$iphone.text) 
neg = c("bad","prank","inferior","evil","poor","minor") 
pos = c("good","great","superior","excellent","positive","super","better") 
analysis <- score.sentiment(tweets,pos,neg) 
table(analysis$score) 

得到下面的(4人得分不錯,592中性得分,4人得分好):

-1 0 1 
    4 592 4 
+0

謝謝。你還可以告訴我代碼中的下列代碼實際上做了什麼:tweets < - as.character(df_ip_out $ iphone.text) – VBB

+0

它將'df_ip_out $ phone.text'向量從因子向量轉換爲字符向量。您可以使用'class()'函數來查看矢量的類型。 –

+0

並且請將此標記爲正確,假設您認爲它是正確的。 –