我是文本分析的新手,目前正在嘗試使用R中的#Quanteda包來滿足我的需求。我想爲一些特定的分配不同的數字權重並測試模型的準確性。我嘗試了在其他線程中提到的方法,通過保留dfm類 Assigning weights to different features in R但是無法獲得正確的輸出。任何幫助,將不勝感激。爲quanteda dfm中的不同項指定不同的數字權重
這裏是我試過
##install.packages("quanteda")
require(quanteda)
str <- c("apple is better than banana", "banana banana apple much
better","much much better new banana")
weights <- c(apple = 5, banana = 3, much = 0.5)
myDfm <- dfm(str, remove = stopwords("english"), verbose = FALSE)
#output
##Document-feature matrix of: 3 documents, 5 features.
##3 x 5 sparse Matrix of class "dfmSparse"
## features
##docs apple better banana much new
##text1 1 1 1 0 0
##text2 1 1 2 1 0
##text3 0 1 1 2 1
newweights <- weights[featnames(myDfm)]
# reassign 1 to non-matched NAs
newweights[is.na(newweights)] <- 1
# this does not works for me - see the output
myDfm * newweights
##output
##Document-feature matrix of: 3 documents, 5 features.
##3 x 5 sparse Matrix of class "dfmSparse"
## features
##docs apple better banana much new
##text1 5 0.5 1.0 0 0
##text2 1 1.0 6.0 5 0
##text3 0 5.0 0.5 2 1
環境細節
平臺x86_64的-W64-mingw32的
拱x86_64的
OS的mingw32
系統x86_64的,mingw32的
狀態
大3
未成年人2.2
2015年
月08
日14
SVN轉69053
咒罵r
version.string [R版本3.2.2(2015年8月14日) 暱稱消防安全
請在https://github.com/kbenoit/quanteda/issues上提出此問題。謝謝! –