像這樣的東西可能會工作...使用slam包中的函數將DTM索引爲簡單三元組矩陣,這將使您不必將其轉換爲密集矩陣。
library(slam)
library(tm)
data(crude)
dtm1 <- DocumentTermMatrix(crude)
# Find the total occurances of each word in all docs
colTotals <- col_sums(dtm1)
# keep only words that occur >20 times in all docs
dtm2 <- dtm1[,which(colTotals > 20)]
> dtm1
A document-term matrix (20 documents, 1266 terms)
Non-/sparse entries: 2255/23065
Sparsity : 91%
Maximal term length: 17
Weighting : term frequency (tf)
> dtm2
A document-term matrix (20 documents, 12 terms)
Non-/sparse entries: 174/66
Sparsity : 28%
Maximal term length: 6
Weighting : term frequency (tf)
這是否適用於您的數據並回答您的問題?
來源
2013-06-02 09:28:46
Ben
嘿,它確實工作,對機制很感興趣,爲什麼矩陣計算不會完成這項工作,用盡內存,但在三重矩陣中它可以?再次感謝您採納您的答案。 – YangJ
稀疏矩陣格式(如簡單三元組)不存儲零。它只是存儲非零值和它們的索引(row num,col num),所以如果你想要「重新充氣」矩陣 – Ben