2
我有一個data.frame
,它有周編號,week
和文本評論text
。我想將week
變量作爲我的分組變量,並對其運行一些基本的文本分析(例如qdap::polarity
)。一些評論文本有多個句子;然而,我只關心本週的「整體」極性。如何鏈接多個qdap轉換用於R中的文本挖掘/情緒(極性)分析
如何在運行qdap::polarity
之前將多個文本轉換鏈接在一起並遵守其警告消息?我能夠將變化鏈接到tm::tm_map
和tm::tm_reduce
- 在qdap
中有什麼可比較的東西嗎?在運行qdap::polarity
和/或qdap::sentSplit
之前預處理/轉換此文本的正確方法是什麼?
下面的代碼/重複的例子,更多細節:
library(qdap)
library(tm)
df <- data.frame(week = c(1, 1, 1, 2, 2, 3, 4),
text = c("This is some text. It was bad. Not good.",
"Another review that was bad!",
"Great job, very helpful; more stuff here, but can't quite get it.",
"Short, poor, not good Dr. Jay, but just so-so. And some more text here.",
"Awesome job! This was a great review. Very helpful and thorough.",
"Not so great.",
"The 1st time Mr. Smith helped me was not good."),
stringsAsFactors = FALSE)
docs <- as.Corpus(df$text, df$week)
funs <- list(stripWhitespace,
tolower,
replace_ordinal,
replace_number,
replace_abbreviation)
# Is there a qdap function that does something similar to the next line?
# Or is there a way to pass this VCorpus/Corpus directly to qdap::polarity?
docs <- tm_map(docs, FUN = tm_reduce, tmFuns = funs)
# At the end of the day, I would like to get this type of output, but adhere to
# the warning message about running sentSplit. How should I pre-treat/cleanse
# these sentences, but keep the "week" grouping?
pol <- polarity(df$text, df$week)
## Not run:
# check_text(df$text)
謝謝泰勒,我希望這會吸引你的眼球。真正的快速(並希望能夠幫助其他人),是由'sentSplit'和/或'sentiment_by'做任何內部轉換?在進行句子拆分之前,我仍然希望進行一些清理轉換,或者在計算極性/情緒之前,如何在調用'sentSplit'之前(或之後)應用轉換?查看函數列表,問題中的'funs' - 我沒有時間(imemdiately)查看sentimentr,因此如果它的文檔中包含這些內容,請不要忽略或感覺指向正確的方向。 – JasonAizkalns
是的麻煩是分裂文本在句子層面。 **情緒**在這項任務中準確得多,因此無需手動解析文本即可獲得更好的結果。在將列作爲向量操作之前進行轉換。我會在(最有可能的)之前進行清潔。 –