2017-04-24 85 views
0

我有看起來像這樣(在XML節點組)438個投手名稱的列表:R - 如何從XML節點集中提取項目?

> pitcherlinks[[1]] 
<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01"> 
    <a href="/players/a/abadfe01.shtml">Fernando Abad</a>* 
</td> 

> pitcherlinks[[2]] 
<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01"> 
    <a href="/players/a/adlemti01.shtml">Tim Adleman</a> 
</td> 

我如何提取像Fernando Abad的名字和相同的/players/a/abadfe01.shtml

回答

1

相關鏈接既然你有一個列表,使用apply函數來遍歷列表。每個函數使用read_html使用CSS選擇器a解析列表中的hmtl片段以查找錨點(鏈接)。名字來自html_text,鏈接在屬性href

library(rvest) 
pitcherlinks <- list() 
pitcherlinks[[1]] <- 
'<td class="left " data-append-csv="abadfe01" data-stat="player" csk="Abad,Fernando0.01"> 
    <a href="/players/a/abadfe01.shtml">Fernando Abad</a>* 
    </td>' 

pitcherlinks[[2]] <- 
    '<td class="left " data-append-csv="adlemti01" data-stat="player" csk="Adleman,Tim0.01"> 
    <a href="/players/a/adlemti01.shtml">Tim Adleman</a> 
     </td>' 

names <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_text()}) 
links <- sapply(pitcherlinks, function(x) {x %>% read_html() %>% html_nodes("a") %>% html_attr("href")}) 

names 
# [1] "Fernando Abad" "Tim Adleman" 
links 
# [1] "/players/a/abadfe01.shtml" "/players/a/adlemti01.shtml"