如何使用rvest的read_html讀取HTML文件列表？

我有一個網頁列表，都是相同的頁面，只是不同的信息。如何使用rvest的read_html讀取HTML文件列表？

像這樣：

http://www.halfordsautocentres.com/autocentres/chesterfield 
http://www.halfordsautocentres.com/autocentres/derby-london-road 
http://www.halfordsautocentres.com/autocentres/derby-wyvern-way

每個人都有的CSS選擇器.store-details__address下不同的地址。

我寫了下面的代碼輸出的正確地址單頁：

derby <- read_html("http://www.halfordsautocentres.com/autocentres/derby-wyvern-way") 
derby %>% 
+ html_node(".store-details__address") %>% 
+ html_text() 
[1] "Unit 7, Wyvern Way, Wyvern Retail Park, Derby, DE21 6NZ"

我怎樣才能讓read_html閱讀的URL，而不僅僅是一個單一的一個列表？

謝謝。

來源

2017-03-02 Conor Grant

您可以使用任何您想要的循環策略：for,lapply,purrr::map。

require(rvest) 
urls <- c("http://www.halfordsautocentres.com/autocentres/chesterfield", 
      "http://www.halfordsautocentres.com/autocentres/derby-london-road", 
      "http://www.halfordsautocentres.com/autocentres/derby-wyvern-way")

基地ř使用for環

out <- vector("character", length = length(urls)) 
for(i in seq_along(urls)){ 
    derby <- read_html(urls[i]) 
    out[i] <- derby %>% 
    html_node(".store-details__address") %>% 
    html_text() 
}

基地ř與*apply

urls %>% 
    lapply(read_html) %>% 
    lapply(html_node, ".store-details__address") %>% 
    vapply(html_text, character(1))

這裏是一個tidyverse/purrr

require(tidyverse) 

urls %>% 
    map(read_html) %>% 
    map(html_node, ".store-details__address") %>% 
    map_chr(html_text)

來源

2017-03-02 12:54:57 Rentrop

謝謝。這很好。我使用* apply函數來完成它 - 按預期工作！ –

如何使用rvest的read_html讀取HTML文件列表？

回答

相關問題