刮掉URL中的「字符串」代碼並使用rvest將其放入向量R

我是r和rvest的新手。兩天前我得到了這個代碼的幫助，這個代碼可以清除所有玩家的名字，並且效果很好。現在，我正在嘗試添加代碼以實現「fetch_current_players」功能，並在其中創建該網站的播放器代碼矢量（從網址中取出）。任何幫助，將不勝感激，因爲我花了一天谷歌搜索，閱讀，並觀看YouTube視頻試圖教我自己。謝謝！刮掉URL中的「字符串」代碼並使用rvest將其放入向量R

library(rvest) 
library(purrr) # flatten/map/safely 
library(dplyr) # progress bar 

fetch_current_players <- function(letter){ 

    URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter) 
    pg <- read_html(URL) 

    if (is.null(pg)) return(NULL) 
    player_data <- html_nodes(pg, "b a") 
    player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name 
    substring(player_code, 12, 20) #Strips the code out of the URL 
    html_text(player_data) 
    player_code #Not sure how to create vector of all codes from all 27 webpages 
} 

pb <- progress_estimated(length(letters)) 
player_list <- flatten_chr(map(letters, function(x) { 
    pb$tick()$print() 
    fetch_current_players(x) 
}))

來源

2016-07-05 Nitreg

我喜歡讓這種事情簡單易讀，沒有什麼錯的for循環。此代碼以簡單的數據框返回名稱和代碼。

library(rvest) 
library(purrr) # flatten/map/safely 
library(dplyr) # progress bar 

fetch_current_players <- function(letter){ 
    URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter) 
    pg <- read_html(URL) 

    if (is.null(pg)) return(NULL) 
    player_data <- html_nodes(pg, "b a") 
    player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name 
    player_code <- substring(player_code, 12, 20) #Strips the code out of the URL 
    player_names <- html_text(player_data) 
    return(data.frame(code=player_code,name=player_names)) 
} 

pb <- progress_estimated(length(letters)) 

for (x in letters) { 
    pb$tick()$print() 
    if(exists("player_list")) 
    {player_list <- rbind(player_list,fetch_current_players(x)) 
    } else player_list <- fetch_current_players(x)  
}

來源

2016-07-05 22:14:32 PeterK

謝謝，完美的工作！ – Nitreg

刮掉URL中的「字符串」代碼並使用rvest將其放入向量R

回答

相關問題