2016-07-05 49 views
0

我是r和rvest的新手。兩天前我得到了這個代碼的幫助,這個代碼可以清除所有玩家的名字,並且效果很好。現在,我正在嘗試添加代碼以實現「fetch_current_players」功能,並在其中創建該網站的播放器代碼矢量(從網址中取出)。任何幫助,將不勝感激,因爲我花了一天谷歌搜索,閱讀,並觀看YouTube視頻試圖教我自己。謝謝!刮掉URL中的「字符串」代碼並使用rvest將其放入向量R

library(rvest) 
library(purrr) # flatten/map/safely 
library(dplyr) # progress bar 

fetch_current_players <- function(letter){ 

    URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter) 
    pg <- read_html(URL) 

    if (is.null(pg)) return(NULL) 
    player_data <- html_nodes(pg, "b a") 
    player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name 
    substring(player_code, 12, 20) #Strips the code out of the URL 
    html_text(player_data) 
    player_code #Not sure how to create vector of all codes from all 27 webpages 
} 

pb <- progress_estimated(length(letters)) 
player_list <- flatten_chr(map(letters, function(x) { 
    pb$tick()$print() 
    fetch_current_players(x) 
})) 

回答

0

我喜歡讓這種事情簡單易讀,沒有什麼錯的for循環。此代碼以簡單的數據框返回名稱和代碼。

library(rvest) 
library(purrr) # flatten/map/safely 
library(dplyr) # progress bar 

fetch_current_players <- function(letter){ 
    URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter) 
    pg <- read_html(URL) 

    if (is.null(pg)) return(NULL) 
    player_data <- html_nodes(pg, "b a") 
    player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name 
    player_code <- substring(player_code, 12, 20) #Strips the code out of the URL 
    player_names <- html_text(player_data) 
    return(data.frame(code=player_code,name=player_names)) 
} 

pb <- progress_estimated(length(letters)) 

for (x in letters) { 
    pb$tick()$print() 
    if(exists("player_list")) 
    {player_list <- rbind(player_list,fetch_current_players(x)) 
    } else player_list <- fetch_current_players(x)  
} 
+0

謝謝,完美的工作! – Nitreg

相關問題