0
我想讓R通過transfermarket.com上的玩家配置文件循環,我首先用以下內容獲取名冊網址。R在RCR與循環刮 - 足球統計
#/ Add the Team’s URL to scrape
TeamScrape <- read_html("http://www.transfermarkt.com/jumplist/startseite/verein/2778")
#// Get Club Name
ClubName <- TeamScrape %>%
html_nodes(".spielername-profil") %>%
html_text()
#// Get All Player URLs
PlayerURLs <- TeamScrape %>%
html_nodes(".spielprofil_tooltip") %>%
html_attr("href")
PlayerURLs <- unique(PlayerURLs)
PlayerURLs <- na.omit(PlayerURLs)
PlayerURLs <- paste0("http://www.transfermarkt.com", PlayerURLs)
PlayerLinks = data.frame(ClubName, PlayerURLs)
這給了我,包括我通過我的下一個刮刀要循環的URL的data.frame - 在「球員簡介刮」。
#/ Add the Player’s URL that you want to scrape
URLLink <- PlayerURLs[13]
PlayerTest <- read_html(URLLink)
#// Squad No
SquadNo <- PlayerTest %>%
html_nodes(".rueckennummer-profil") %>%
html_text()
#// Name
Name <- PlayerTest %>%
html_nodes(".spielername-profil") %>%
html_text()
#// Nationality
Nationality <- PlayerTest %>%
html_nodes(".flaggenrahmen+ span") %>%
html_text()
#// Club
Club <- PlayerTest %>%
html_nodes(".vereinprofil_tooltip+ .vereinprofil_tooltip") %>%
html_text()
#// Position
Position <- PlayerTest %>%
html_nodes(".list+ .list tr:nth-child(3) td") %>%
html_text()
#// DOB
DOB <- PlayerTest %>%
html_nodes(".wsnw") %>%
html_text()
#// Age
Age <- PlayerTest %>%
html_nodes(".profilheader .hide-for-small td") %>%
html_text() %>%
as.numeric()
#// Value
Value <- PlayerTest %>%
html_nodes(".marktwert a") %>%
html_text()
#// Matches Played this Season
Matches <- PlayerTest %>%
html_nodes(".hide.hide-for-small+ .zentriert") %>%
html_text() %>%
as.numeric()
#// Goals Scored this Season
Goals <- PlayerTest %>%
html_nodes("#yw1 tfoot .zentriert:nth-child(4)") %>%
html_text() %>%
as.numeric()
#// Assists Made this Season
Assists <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(5)") %>%
html_text() %>%
as.numeric()
#// Mins Played this Season
Minutes <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(7)") %>%
html_text() %>%
as.numeric()
#// Some Cleaning Up of the Data
# to_remove_SquadNo <- paste(c("#"))
# SquadNo <- gsub(to_remove_SquadNo, "", SquadNo)
# Minutes <- regmatches(Minutes, gregexpr("[[:digit:]]+", Minutes))
# as.numeric(unlist(Minutes))
#// Create the Data Frame
output = data.frame(SquadNo, Name, Nationality, Club, Position, DOB, Age, Value, Matches, Goals, Assists, Minutes)
我的目標是根據來自Team Scraper的URL循環播放器配置文件刮板。我嘗試了許多不同的循環嘗試,我迷路了!真的很感謝一些建議!
HubertL - 感謝您的快速反應。我做了你說的,我得到了這個:data.frame中的錯誤(SquadNo,Name,Nationality,Club,ContractUntil,Position,: argument implyly different of rows:0,1 另外:警告消息: 1:在function_list [[k]](value)中:通過強制引入NAs 2:在函數列表[[k]](值)中:通過強制引入的NA 3:在函數列表[[k]](值)強制 調用方:data.frame(SquadNo,Name,Nationality,Club,ContractUntil,Position, DOB,Age) – user1593995
這是因爲有些數據丟失了 – HubertL
奇怪的是,即使我將報廢減少爲2個因素,行值不同... – user1593995