2013-02-24 21 views
3

我試圖解決我自己就在tm包改變TF-IDF權重函數問題的文件:https://stackoverflow.com/questions/15045313/changing-tf-idf-weight-function-weight-not-by-occurrences-of-term-but-by-numbe無法找到一個R函數row_sums和col_sums

在這樣做時,我在看的weightTfIdf函數,其中包括m(TermDocumentMatrix)上的以下代碼。

cs <- col_sums(m) 

rs <- row_sums(m) 

但我找不到的功能row_sumscol_sums任何文件;當我嘗試使用它們編寫我自己的權重函數時,出現錯誤:Error in weighting(x) : could not find function "col_sums"

這些函數在哪裏定義?

我從下面R粘貼功能齊全的信息:

function (m, normalize = TRUE) 
{ 
    isDTM <- inherits(m, "DocumentTermMatrix") 
    if (isDTM) 
     m <- t(m) 
    if (normalize) { 
     cs <- col_sums(m) 
     if (any(cs == 0)) 
      warning("empty document(s): ", paste(Docs(m)[cs == 
       0], collapse = " ")) 
     names(cs) <- seq_len(nDocs(m)) 
     m$v <- m$v/cs[m$j] 
    } 
    rs <- row_sums(m > 0) 
    if (any(rs == 0)) 
     warning("unreferenced term(s): ", paste(Terms(m)[rs == 
      0], collapse = " ")) 
    lnrs <- log2(nDocs(m)/rs) 
    lnrs[!is.finite(lnrs)] <- 0 
    m <- m * lnrs 
    attr(m, "Weighting") <- c(sprintf("%s%s", "term frequency - inverse document frequency", 
     if (normalize) " (normalized)" else ""), "tf-idf") 
    if (isDTM) 
     t(m) 
    else m 
} 
<environment: namespace:tm> 
attr(,"class") 
[1] "WeightFunction" "function"  
attr(,"Name") 
[1] "term frequency - inverse document frequency" 
attr(,"Acronym") 
[1] "tf-idf" 

回答

14

你要找的功能是在「大滿貫」包。由於'slam'只能導入,並不是依賴項,所以需要一點點工作才能查看文檔。下面是一個示例會議,介紹如何解決這個問題並查看文檔。

> # I'm assuming you loaded tm first 
> library(tm) 
> # See if we can view the code 
> col_sums 
Error: object 'col_sums' not found 
> # Use getAnywhere to grab the function even if the function is 
> # in a namespace that isn't exported 
> getAnywhere("col_sums") 
A single object matching ‘col_sums’ was found 
It was found in the following places 
    namespace:slam 
with value 

function (x, na.rm = FALSE, dims = 1, ...) 
UseMethod("col_sums") 
<environment: namespace:slam> 
> # So the function is in the slam package 
> slam::col_sums 
function (x, na.rm = FALSE, dims = 1, ...) 
UseMethod("col_sums") 
<environment: namespace:slam> 
> # We can tell help to look in the slam package now that we know 
> # where the function is from  
> help(col_sums, package = "slam") 
> # alternatively 
> library(slam) 
> ?col_sums 
> # If we want to view the actual code for col_sums we need to 
> # do a little work too 
> methods("col_sums") 
[1] col_sums.default*    col_sums.simple_triplet_matrix* 

    Non-visible functions are asterisked 
> # We probably want the default version? Otherwise change to the other one 
> getAnywhere("col_sums.default") 
A single object matching ‘col_sums.default’ was found 
It was found in the following places 
    registered S3 method for col_sums from namespace slam 
    namespace:slam 
with value 

function (x, na.rm = FALSE, dims = 1, ...) 
base:::colSums(x, na.rm, dims, ...) 
<environment: namespace:slam> 

所以col_sums函數只是基函數colSums的一個包裝。

+0

謝謝你的回答,不僅回答我的問題:col_sums;但它向我展示了未來如何成爲更好的R用戶。真的是一個很好的答案。謝謝。 – cforster 2013-02-24 19:46:15

+0

+11,以便進一步瞭解並展示如何查找源代碼!做得好 – 2013-02-24 21:41:42

+0

@RicardoSaporta - +11 - 太離譜了! :-) – thelatemail 2013-02-24 22:00:10