2014-10-16 70 views
0

我試圖從PubMed檢索引用信息 - 使用RefManageR和PubMed ID(pmids) - 。使用RefManageR獲取來自PMID的PubMed信息 - 在循環中

我選擇了RefManageR,因爲它很容易以data.frame格式粘貼輸出。對我而言,仍然難以理解,並且自己使用PubMed API。

我能寫一個使用「PMIds的字符串」作爲輸入代碼來獲取數據:

require(RCurl) 
urli <- getURL("https://gist.githubusercontent.com/aurora-mareviv/3840512f6777d5293218/raw/dfd6b76ceb22c52aa073fc05211dcea986406914/pmids.csv", ssl.verifypeer = FALSE) 
pmids <- read.csv(textConnection(urli)) 
head(pmids) 
index10 <- pmids$pmId[1:10] 
indice10 <- paste(pmids$pmId[1:10], collapse=" ") 

# install.packages("RefManageR") 
library(RefManageR) 
auth.pm10 <- ReadPubMed(indice10, database = "PubMed", mindate = 1950) 
auth.pm10d <- data.frame(auth.pm10) 
View(auth.pm10) 

但是,如果我想從500個pmids得到引證,我想我應該避免在PubMed服務器中進行長查詢。我的想法是,通過在矢量index10所有元素,與此類似,以使該環路的功能:

extract.pub <- 
    function(id=indice, dbase=d.base, mindat=1950){ 
    require(RefManageR) 
    indice <- id # Author 
    d.base <- dbase # like PubMed, etc 
    min.dat <- mindat # Date from... 
    auth.pm <- NULL 
    for(i in indice){ 
     auth.pm <- ReadPubMed(indice, database = d.base, mindate = min.dat) 
     } 
    auth.pm <- data.frame(auth.pm) 
    auth.pm 
    } 

cites <- extract.pub(index10, dbase="PubMed") 
View(cites) 

它提供了以下錯誤:Error : Internal server error

但是,如果我插入indice10(串),而不是index10(矢量),它的工作原理:

cites <- extract.pub(indice10, dbase="PubMed") 
View(cites) 

¿我如何才能讓這個循環的工作?或者這種方法對我的目的不是最好的?

回答

1

ReadPubMEd只接受一個pmid或每個函數調用的查詢。嘗試:

lapply(pmids[1:3], ReadPubMed, database = "PubMed", mindate = 1950) 

[[1]] 
[1] P. M. Zeltzer, B. Bodey, A. Marlin, et al. 「Immunophenotype profile of childhood 
medulloblastomas and supratentorial primitive neuroectodermal tumors using 16 monoclonal 
antibodies」. Eng. In: _Cancer_ 66.2 (1990), pp. 273-83. PMID: 2196109. 

[[2]] 
[1] L. C. Rome, R. P. Funke and R. M. Alexander. 「The influence of temperature on muscle 
velocity and sustained performance in swimming carp」. Eng. In: _The Journal of 
experimental biology_ 154 (1990), pp. 163-78. PMID: 2277258. 

[[3]] 
[1] P. Henry. 「[Headache, facial neuralgia. Diagnostic orientation and management]」. Fre. 
In: _La Revue du praticien_ 40.7 (1990), pp. 677-81. PMID: 2326596. 

你可以把BibEntry類的元素融入一個data.frame和格式製作精美

lapply(pmids[1:3], function(x){ 
tmp <- unlist(ReadPubMed(x, database = "PubMed", mindate = 1950)) 
tmp <- lapply(tmp, function(z) if(is(z, "person")) paste0(z, collapse = ",") else z) 
data.frame(tmp, stringsAsFactors = FALSE) 
}) 

                                title 
1 Immunophenotype profile of childhood medulloblastomas and supratentorial primitive neuroectodermal tumors using 16 monoclonal antibodies 
2            The influence of temperature on muscle velocity and sustained performance in swimming carp 
3                  [Headache, facial neuralgia. Diagnostic orientation and management] 
            author year        journal volume number pages eprint language eprinttype bibtype 
1 P M Zeltzer,B Bodey,A Marlin,J Kemshead 1990        Cancer  66  2 273-83 2196109  eng  pubmed Article 
2  L C Rome,R P Funke,R M Alexander 1990 The Journal of experimental biology 154 <NA> 163-78 2277258  eng  pubmed Article 
3         P Henry 1990    La Revue du praticien  40  7 677-81 2326596  fre  pubmed Article 
    dateobj      key 
1 1990-01-01 zeltzer1990immunophenotype 
2 1990-01-01   rome1990influence 
3 1990-01-01   henry1990headache 
+0

謝謝,我沒有想到ap簾布層!不過,我一直在試圖修改你的代碼來獲得可以強制轉換爲data.frame的東西。例如: 'adata < - ddply(pmids [1:3],「pmId」,ReadPubMed,database =「PubMed」,mindate = 1950)' 並再次出現相同的錯誤。我想我在這裏沒有捕捉到關於應用的重要概念。 – Mareviv 2014-10-16 22:08:49

+1

那麼,從'ReadPubMed'返回的是'BibEntry'類,它實際上是一個約13個元素的列表。所以,如果你願意的話,你可以將這些元素中的每一個放在data.frame的一列中。據我所知,你不能將BibEntry類對象放入data.frame中。我會編輯上面的答案 – sckott 2014-10-16 22:48:21