xml內容似乎不是xml

嘗試從r-users.com檢索一些信息。我使用下面的代碼，我收到警告消息：xml內容似乎不是xml

XML content does not seem to be XML

任何幫助，將不勝感激。

library(data.table) 
library(XML) 

pages <- c(1:10) 

urls <- rbindlist (lapply(pages, function(x) { 
    url <- paste("https://www.r-users.com/jobs/page/",x,"/", sep="") 
    data.frame(url) 
}), fill=TRUE) 

jobLocations <- rbindlist (apply(urls, 1, function(url) { 
    doc1 <- htmlParse (url) 
    locations <- getNodeSet(doc1, '//*[@id="mainContent"]/div[2]/ol/li/dl/dd[3]/span') 
    data.frame(sapply(locations, function(x) { xmlValue(x) })) 
    }), fill = TRUE)

來源

2016-09-16 flaflaflunky

如果我訪問一個URL和查看源例如https://www.r-users.com/jobs/page/1/頁面上沒有XML（儘管它可能在後臺加載XML以獲得結果）。我懷疑你的錯誤是正確的，你解析HTML，而不是XML。 –

rvest和purrr是用於網絡刮一個強大的組合：

library(rvest) 
library(purrr) 

      # make URLs 
locations <- 1:10 %>% paste0("https://www.r-users.com/jobs/page/", .) %>% 
    # pull and parse HTML for each URL 
    map(read_html) %>% 
    # select nodes for each page's HTML 
    map(html_nodes, xpath = '//*[@id="mainContent"]/div[2]/ol/li/dl/dd[3]/span') %>% 
    # return text inside of each node 
    map(html_text) %>% 
    # simplify list to vector 
    simplify() 

head(locations) 
## [1] "Massachusetts, United States" "New York, United States"  "England, United Kingdom"  
## [4] "California, United States" "Ontario, Canada"    "Indiana, United States"

來源

2016-09-16 01:50:04 alistaire

xml內容似乎不是xml

回答

相關問題