2013-07-28 46 views
3

我希望通過一個包來發現每個函數的幫助文件中提到的作者。從R的Rd文件訪問元素?

我查找了一個函數來從R的幫助文件中提取元素,並可以找到一個。我能找到的最接近的是來自Noam Ross的this post

這樣的功能是否存在? (如果沒有,我想我會破解Noam的代碼以解析Rd文件,並找到我感興趣的特定元素)。

謝謝,塔爾。

潛在代碼例如:

get_field_from_r_help(topic="lm", field = "Description") # 
# output: 

「LM」被用於擬合的線性模型。它可以用來進行 迴歸,方差和協方差 分析的單一階層分析(儘管「AOV」可以提供更方便的接口 這些)。

+0

的可能重複[如何從R內部的幫助內容寫入一個文件?](http://stackoverflow.com/questions/7493843/how-to-write-contents-of-help-to-a- file-from-within-r) –

+0

示例輸入和輸出? – Spacedman

+1

約書亞 - 這不是重複的,因爲它只涉及提取整個文本的步驟,而不涉及如何解析它。 Spacedman - 在一分鐘內。 –

回答

5

This document鄧肯默多克,也將this SO post

從這些,你也許可以嘗試一些類似如下:

getauthors <- function(package){ 
    db <- tools::Rd_db(package) 
    authors <- lapply(db,function(x) { 
     tags <- tools:::RdTags(x) 
     if("\\author" %in% tags){ 
      # return a crazy list of results 
      #out <- x[which(tmp=="\\author")] 
      # return something a little cleaner 
      out <- paste(unlist(x[which(tags=="\\author")]),collapse="") 
     } 
     else 
      out <- NULL 
     invisible(out) 
     }) 
    gsub("\n","",unlist(authors)) # further cleanup 
} 

然後,我們可以在一個包或兩個運行此:

> getauthors("knitr") 
                        d:/RCompile/CRANpkg/local/3.0/knitr/man/eclipse_theme.Rd 
                                " Ramnath Vaidyanathan" 
                         d:/RCompile/CRANpkg/local/3.0/knitr/man/image_uri.Rd 
                                " Wush Wu and Yihui Xie" 
                         d:/RCompile/CRANpkg/local/3.0/knitr/man/imgur_upload.Rd 
                       " Yihui Xie, adapted from the imguR package by Aaron Statham" 
                          d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2pdf.Rd 
                         " Ramnath Vaidyanathan, Alex Zvoleff and Yihui Xie" 
                          d:/RCompile/CRANpkg/local/3.0/knitr/man/knit2wp.Rd 
                              " William K. Morris and Yihui Xie" 
                         d:/RCompile/CRANpkg/local/3.0/knitr/man/knit_theme.Rd 
                             " Ramnath Vaidyanathan and Yihui Xie" 
                        d:/RCompile/CRANpkg/local/3.0/knitr/man/knitr-package.Rd 
                              " Yihui Xie <http://yihui.name>" 
                         d:/RCompile/CRANpkg/local/3.0/knitr/man/read_chunk.Rd 
         " Yihui Xie; the idea of the second approach came from Peter Ruckdeschel (author of the SweaveListingUtils package)" 
                         d:/RCompile/CRANpkg/local/3.0/knitr/man/read_rforge.Rd 
                              " Yihui Xie and Peter Ruckdeschel" 
                          d:/RCompile/CRANpkg/local/3.0/knitr/man/rst2pdf.Rd 
                               " Alex Zvoleff and Yihui Xie" 
                           d:/RCompile/CRANpkg/local/3.0/knitr/man/spin.Rd 
" Yihui Xie, with the original idea from Richard FitzJohn (who named it as sowsear() which meant to make a silk purse out of a sow's ear)" 

也許工具

> getauthors("tools") 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/bibstyle.Rd 
                   " Duncan Murdoch" 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/checkPoFiles.Rd 
                   " Duncan Murdoch" 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/checkRd.Rd 
                " Duncan Murdoch, Brian Ripley" 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/getDepList.Rd 
                    " Jeff Gentry " 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/HTMLlinks.Rd 
                "Duncan Murdoch, Brian Ripley" 
      D:/murdoch/recent/R64-3.0/src/library/tools/man/installFoundDepends.Rd 
                    "Jeff Gentry" 
       D:/murdoch/recent/R64-3.0/src/library/tools/man/makeLazyLoading.Rd 
                "Luke Tierney and Brian Ripley" 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/parse_Rd.Rd 
                   " Duncan Murdoch " 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/parseLatex.Rd 
                    "Duncan Murdoch" 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2HTML.Rd 
                " Duncan Murdoch, Brian Ripley" 
       D:/murdoch/recent/R64-3.0/src/library/tools/man/Rd2txt_options.Rd 
                    "Duncan Murdoch" 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/RdTextFilter.Rd 
                   " Duncan Murdoch" 
       D:/murdoch/recent/R64-3.0/src/library/tools/man/SweaveTeXFilter.Rd 
                    "Duncan Murdoch" 
         D:/murdoch/recent/R64-3.0/src/library/tools/man/texi2dvi.Rd 
        " Originally Achim Zeileis but largely rewritten by R-core." 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/tools-package.Rd 
" Kurt Hornik and Friedrich Leisch Maintainer: R Core Team [email protected]" 
       D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteDepends.Rd 
                    " Jeff Gentry " 
       D:/murdoch/recent/R64-3.0/src/library/tools/man/vignetteEngine.Rd 
              "Duncan Murdoch and Henrik Bengtsson." 
        D:/murdoch/recent/R64-3.0/src/library/tools/man/writePACKAGES.Rd 
                 " Uwe Ligges and R-core." 

有些函數沒有作者字段,所以這個ju st在getauthors的末尾調用unlist時會丟棄這些值,但可以稍微修改該代碼以返回NULL的值。

而且,進一步的分析將成爲一個有點困難,因爲包作者似乎非常不同的方式來使用這個字段。 devtools只有一個作者字段。在汽車有一堆,其中每個包含一個電子郵件地址。等等,但是這會讓你獲得可用的信息,你應該能夠進一步處理。

注:我以前這個答案的版本提供了一個解決方案,如果你有一個路文件的完整路徑,但如果你試圖對已安裝的軟件包做到這一點沒有工作。遵循泰勒的建議,我已經制定了一個更完整的解決方案。

+0

你可以向我們展示一個軟件包的例子,你可以在其中運行所有的.Rd文件並獲取作者。我嘗試了這種方法,但無法取得成功,並希望看到更清潔的方法工作。 –

+1

@TylerRinker查看更新。我試了幾個包,似乎一般工作。 – Thomas

+0

非常好的方法謝謝分享+1 –

1

這是使用他人提出了一些建議我的方法:在解析Rd文件將有助於

package <- "qdap" 
funs <- unclass(lsf.str(envir = asNamespace(package))) 

out <- sapply(funs, function(x) { 
    x <- try(capture.output(tools:::Rd2txt(utils:::.getHelpFile(as.character(help(x, help_type="text")))))) 
    Auth_lines <- grep("_\bA_\bu_\bt_\bh_\bo_\br(_\bs):", x, fixed = TRUE) 
    if (identical(Auth_lines, integer(0))) { 
     return(NA) 
    } 
    gsub("^\\s+|\\s+$", "", x[Auth_lines +2]) 
}) 

## To look at just the ones with author fields: 
out[!sapply(out, is.na)] 

## > out[!sapply(out, is.na)] 
##               beg2char 
##     "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##               bracketX 
##  "Martin Morgan and Tyler Rinker <[email protected]>." 
##             bracketXtract 
##  "Martin Morgan and Tyler Rinker <[email protected]>." 
##               char2end 
##     "Josh O'Brien, Justin Haynes and Tyler Rinker" 
##             cm_df.transcript 
## "DWin, Gavin Simpson and Tyler Rinker <[email protected]>." 
##               gantt 
##   "DigEmAll (<URL: stackoverflow.com>) and Tyler Rinker" 
##              gantt_wrap 
##  "Andrie de Vries and Tyler Rinker <[email protected]>." 
##                genX 
##  "Martin Morgan and Tyler Rinker <[email protected]>." 
##              genXtract 
##  "Martin Morgan and Tyler Rinker <[email protected]>." 
##                hash 
##  "Bryan Goodrich and Tyler Rinker <[email protected]>." 
##               name2sex 
## "Dason Kurkiewicz and Tyler Rinker <[email protected]>." 
##             read.transcript 
##  "Bryan Goodrich and Tyler Rinker <[email protected]>." 
##              sentCombine 
## "Dason Kurkiewicz and Tyler Rinker <[email protected]>." 
##              sentSplit 
## "Dason Kurkiewicz and Tyler Rinker <[email protected]>." 
##                TOT 
## "Dason Kurkiewicz and Tyler Rinker <[email protected]>." 
##               v.outer 
## "Vincent Zoonekynd and Tyler Rinker <[email protected]>." 
+0

嗨泰勒,很好的答案(你得到+1,因爲托馬斯似乎已經找到了一些更好的功能依靠)。謝謝:) –