如何鏈接多個qdap轉換用於R中的文本挖掘/情緒（極性）分析

我有一個data.frame，它有周編號，week和文本評論text。我想將week變量作爲我的分組變量，並對其運行一些基本的文本分析（例如qdap::polarity）。一些評論文本有多個句子;然而，我只關心本週的「整體」極性。如何鏈接多個qdap轉換用於R中的文本挖掘/情緒（極性）分析

如何在運行qdap::polarity之前將多個文本轉換鏈接在一起並遵守其警告消息？我能夠將變化鏈接到tm::tm_map和tm::tm_reduce - 在qdap中有什麼可比較的東西嗎？在運行qdap::polarity和/或qdap::sentSplit之前預處理/轉換此文本的正確方法是什麼？

下面的代碼/重複的例子，更多細節：

library(qdap) 
library(tm) 

df <- data.frame(week = c(1, 1, 1, 2, 2, 3, 4), 
       text = c("This is some text. It was bad. Not good.", 
          "Another review that was bad!", 
          "Great job, very helpful; more stuff here, but can't quite get it.", 
          "Short, poor, not good Dr. Jay, but just so-so. And some more text here.", 
          "Awesome job! This was a great review. Very helpful and thorough.", 
          "Not so great.", 
          "The 1st time Mr. Smith helped me was not good."), 
       stringsAsFactors = FALSE) 

docs <- as.Corpus(df$text, df$week) 

funs <- list(stripWhitespace, 
      tolower, 
      replace_ordinal, 
      replace_number, 
      replace_abbreviation) 

# Is there a qdap function that does something similar to the next line? 
# Or is there a way to pass this VCorpus/Corpus directly to qdap::polarity? 
docs <- tm_map(docs, FUN = tm_reduce, tmFuns = funs) 


# At the end of the day, I would like to get this type of output, but adhere to 
# the warning message about running sentSplit. How should I pre-treat/cleanse 
# these sentences, but keep the "week" grouping? 
pol <- polarity(df$text, df$week) 

## Not run: 
# check_text(df$text)

來源

2015-12-01 JasonAizkalns

在警告提示如下您可以運行sentSplit：

df_split <- sentSplit(df, "text") 
with(df_split, polarity(text, week)) 

## week total.sentences total.words ave.polarity sd.polarity stan.mean.polarity 
## 1 1    5   26  -0.138  0.710    -0.195 
## 2 2    6   26  0.342  0.402    0.852 
## 3 3    1   3  -0.577   NA     NA 
## 4 4    2   10  0.000  0.000    NaN

請注意，我有一個突破感悟包sentimentr可用在github上，這是在速度，功能和文檔方面的改進，通過qdap版本。這會在sentiment_by函數內部進行內部分割。下面的腳本允許你安裝包並使用它：

if (!require("pacman")) install.packages("pacman") 
p_load_gh("trinker/sentimentr") 

with(df, sentiment_by(text, week)) 

## week word_count  sd ave_sentiment 
## 1: 2   25 0.7562542 0.21086408 
## 2: 1   26 1.1291541 0.05781106 
## 3: 4   10  NA 0.00000000 
## 4: 3   3  NA -0.57735027

來源

2015-12-02 03:28:58

謝謝泰勒，我希望這會吸引你的眼球。真正的快速（並希望能夠幫助其他人），是由'sentSplit'和/或'sentiment_by'做任何內部轉換？在進行句子拆分之前，我仍然希望進行一些清理轉換，或者在計算極性/情緒之前，如何在調用'sentSplit'之前（或之後）應用轉換？查看函數列表，問題中的'funs' - 我沒有時間（imemdiately）查看sentimentr，因此如果它的文檔中包含這些內容，請不要忽略或感覺指向正確的方向。 – JasonAizkalns

是的麻煩是分裂文本在句子層面。 **情緒**在這項任務中準確得多，因此無需手動解析文本即可獲得更好的結果。在將列作爲向量操作之前進行轉換。我會在（最有可能的）之前進行清潔。 –

如何鏈接多個qdap轉換用於R中的文本挖掘/情緒（極性）分析

回答

相關問題