3
我試圖解決我自己就在tm
包改變TF-IDF權重函數問題的文件:https://stackoverflow.com/questions/15045313/changing-tf-idf-weight-function-weight-not-by-occurrences-of-term-but-by-numbe無法找到一個R函數row_sums和col_sums
在這樣做時,我在看的weightTfIdf
函數,其中包括m
(TermDocumentMatrix)上的以下代碼。
cs <- col_sums(m)
和
rs <- row_sums(m)
但我找不到的功能row_sums
或col_sums
任何文件;當我嘗試使用它們編寫我自己的權重函數時,出現錯誤:Error in weighting(x) : could not find function "col_sums"
這些函數在哪裏定義?
我從下面R
粘貼功能齊全的信息:
function (m, normalize = TRUE)
{
isDTM <- inherits(m, "DocumentTermMatrix")
if (isDTM)
m <- t(m)
if (normalize) {
cs <- col_sums(m)
if (any(cs == 0))
warning("empty document(s): ", paste(Docs(m)[cs ==
0], collapse = " "))
names(cs) <- seq_len(nDocs(m))
m$v <- m$v/cs[m$j]
}
rs <- row_sums(m > 0)
if (any(rs == 0))
warning("unreferenced term(s): ", paste(Terms(m)[rs ==
0], collapse = " "))
lnrs <- log2(nDocs(m)/rs)
lnrs[!is.finite(lnrs)] <- 0
m <- m * lnrs
attr(m, "Weighting") <- c(sprintf("%s%s", "term frequency - inverse document frequency",
if (normalize) " (normalized)" else ""), "tf-idf")
if (isDTM)
t(m)
else m
}
<environment: namespace:tm>
attr(,"class")
[1] "WeightFunction" "function"
attr(,"Name")
[1] "term frequency - inverse document frequency"
attr(,"Acronym")
[1] "tf-idf"
謝謝你的回答,不僅回答我的問題:col_sums;但它向我展示了未來如何成爲更好的R用戶。真的是一個很好的答案。謝謝。 – cforster 2013-02-24 19:46:15
+11,以便進一步瞭解並展示如何查找源代碼!做得好 – 2013-02-24 21:41:42
@RicardoSaporta - +11 - 太離譜了! :-) – thelatemail 2013-02-24 22:00:10