2017-04-05 56 views
0

我試圖從Stack Overflow question重新生成XML程序包中的命令。R xmlParse/xmlTreeParse未知IO錯誤

> library(XML) 
> library(RCurl) 

> nct_url <- "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true" 
> xml_doc <- xmlParse(nct_url, useInternalNodes=TRUE) 
Unknown IO errorfailed to load external entity "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true" 
Error: 1: Unknown IO error2: failed to load external entity "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true" 

> doc <- xmlTreeParse(getURL(nct_url), useInternalNodes=TRUE) 
Error: XML content does not seem to be XML: '' 
> getURL(nct_url) 
[1] "" 

nct_url的數據鏈接有效,是一個XML文件。任何想法在這裏出了什麼問題?

> sessionInfo() 
R version 3.3.3 (2017-03-06) 
Platform: x86_64-suse-linux-gnu (64-bit) 
Running under: openSUSE 13.2 (Harlequin) (x86_64) 

locale: 
[1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C    
[3] LC_TIME=en_US.UTF-8  LC_COLLATE=en_US.UTF-8  
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
[7] LC_PAPER=en_US.UTF-8  LC_NAME=C     
[9] LC_ADDRESS=C    LC_TELEPHONE=C    
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] RCurl_1.95-4.8 bitops_1.0-6 XML_3.98-1.4 

回答

1

看起來工作得很好,我(使用xml2):對我來說

library(xml2) 
library(tidyverse) 

doc <- read_xml("https://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true") 

doc 
## {xml_document} 
## <clinical_study> 
## [1] <required_header>\n <download_date>ClinicalTrials.gov processed th ... 
## [2] <id_info>\n <org_study_id>ARG-CS3-001</org_study_id>\n <nct_id>NC ... 
## [3] <brief_title>A Study of the Safety and Efficacy of Nitric Oxide Red ... 
## [4] <official_title>A Phase III International, Multi-Center, Prospectiv ... 
## [5] <sponsors>\n <lead_sponsor>\n <agency>Arginox Pharmaceuticals</ ... 
## [6] <source>Arginox Pharmaceuticals</source> 
## [7] <brief_summary>\n <textblock>\n  Tilarginine Acetate Injection ... 
## [8] <detailed_description>\n <textblock>\n  An estimated 120,000 t ... 
## [9] <overall_status>Terminated</overall_status> 
## [10] <start_date>May 2005</start_date> 
## [11] <completion_date>January 2007</completion_date> 
## [12] <phase>Phase 3</phase> 
## [13] <study_type>Interventional</study_type> 
## [14] <study_design_info>\n <allocation>Randomized</allocation>\n <inte ... 
## [15] <primary_outcome>\n <measure>All cause mortality at 30 days post r ... 
## [16] <secondary_outcome>\n <measure>Number of patients demonstrating re ... 
## [17] <secondary_outcome>\n <measure>The duration of cardiogenic shock c ... 
## [18] <enrollment>658</enrollment> 
## [19] <condition>Shock, Cardiogenic</condition> 
## [20] <intervention>\n <intervention_type>Drug</intervention_type>\n <i ... 
## ... 

xml_find_all(doc, ".//location") %>% 
    map(xml_children) %>% 
    map(xml_find_all, ".//*") %>% 
    map_df(~as.list(set_names(xml_text(.), xml_name(.)))) %>% 
    select(-address) %>% 
    glimpse() 
## Observations: 102 
## Variables: 5 
## $ name <chr> "The Heart Group, PC", "Sparks Regional Medical Center... 
## $ city <chr> "Mobile", "Fort Smith", "Mesa", "Phoenix", "Little Roc... 
## $ state <chr> "Alabama", "Arizona", "Arizona", "Arizona", "Arkansas"... 
## $ zip  <chr> "36608", "72901", "85206", "85043", "72205", "90017", ... 
## $ country <chr> "United States", "United States", "United States", "Un... 
+0

作品了。 XML包中有什麼問題嗎? – maxie