2014-05-02 47 views
5

我嘗試使用R和XML包加載一些可公開獲得的數據NHS但我不斷收到以下錯誤信息:我好像htmlParse無法加載外部實體

Error: failed to load external entity " http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/ "

無法弄清楚儘管查看了一些相關問題,但可能會導致這種情況。

這裏是我的代碼非常簡單:

library("XML") 
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- htmlParse(url) 

編輯:會話信息

R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit)

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats graphics grDevices utils
datasets methods base

loaded via a namespace (and not attached): [1] tools_3.0.1

+0

這不是一個有效的XML文檔:[W3 Validator](http://validator.w3.org/check?uri=http%3A%2F%2Fwww.england.nhs.uk%2Fstatistics%2Fstatistical-work-areas%2Fbed -availability和 - 佔用%2F&字符集=%28detect +自動%29&DOCTYPE =內嵌&組= 0&詳細= 1)。它至少應該是XHTML,而不是HTML5。 – CoDEmanX

+0

當我在Ubuntu上運行該代碼時,它成功運行在r-fiddle上。你可以添加sessionInfo()嗎? http://www.r-fiddle.org/#/fiddle?id=AfoyOSGm –

+0

sessionInfo()添加!我懷疑我已經有了答案。這幾乎肯定是由我的作品的代理人造成的。我以前遇到過這個問題(通過QGIS),並且從未找到滿意的解決方案。 – Tumbledown

回答

5

包XML有一些問題。問題是intermitent並且與URL無關。我使用HTTR包的功能GET以獲得html代碼解決了這個問題,然後將其傳遞給htmlParse,見下圖:

library("XML") 
library(httr) 
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- htmlParse(rawToChar(GET(url)$content)) 
3

您還可以使用rvest &的xml2包:

library(rvest) # github version 
library(xml2) # github version 

url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/" 
doc <- read_html(url) 

doc %>% 
    html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>% 
    html_attr("href") 

## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/" 
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/" 
相關問題