0
我試圖用rvest從維基百科(包括從其他網頁鏈接)拉ISO國家簡介。我找不到包含名稱的正確獲取鏈接(href屬性)的方法(我試過xpath字符串函數會導致錯誤)。運行起來相當容易 - 而且自我解釋。的R - 網頁刮痧 - 麻煩獲取屬性值使用rvest
任何幫助表示讚賞!
library(rvest)
library(dplyr)
searchPage <- read_html("https://en.wikipedia.org/wiki/ISO_3166-2")
nodes <- html_node(searchPage, xpath = '(//h2[(span/@id = "Current_codes")]/following-sibling::table)[1]')
codes <- html_nodes(nodes, xpath = 'tr/td[1]/a/text()')
names <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/text()')
#Following brings back data but attribute name as well
links <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href')
#Following returns nothing
links2 <- html_nodes(nodes, xpath = 'tr/td[2]//a[@title]/@href/text()')
#Following Errors
links3 <- html_nodes(nodes, xpath = 'string(tr/td[2]//a[@title]/@href)')
#Following Errors
links4 <- sapply(nodes, function(x) { x %>% read_html() %>% html_nodes("tr/td[2]//a[@title]") %>% html_attr("href") })
謝謝!對不起,我認爲評論會足夠好,將來會嘗試着提供更多信息! –