使用R刮亞馬遜網頁

我正在使用R刮取亞馬遜網站獲取產品的價格。產品存在5頁，所以每次我應該輸入不同的網址。那是我使用的代碼：使用R刮亞馬遜網頁

pages<-c(1,2,3,4,5) 
##getting the url of the 5 pages 
urls<-rbindlist(lapply(pages,function(x){ 
    url<-paste("https://www.amazon.co.uk/Best-Sellers-Health-Personal-Care-Weight-Loss-Supplements/zgbs/drugstore/2826476031#",x,sep="") 
    data.frame(url) 
}),fill=TRUE) 


product.price<-rbindlist(apply(urls,1,function(url){ 
    locations <- url %>% 
    map(read_html) %>% 
    map(html_nodes, xpath = '//*[@id="zg_centerListWrapper"]/div/div[2]/div/div[2]/span[1]/span') %>% 
    map(html_text) %>% 
    simplify() 
    data.frame(locations) 
}),fill=TRUE)

有100個產品，每頁20，和我所得到的是第20重複5次。這意味着我只輸入第一個網址。我怎樣才能訪問所有的頁面？

感謝

來源

2017-08-07 Basel.D

這是我的看法：

library(rvest) 

url <- 'https://www.amazon.co.uk/Best-Sellers-Health-Personal-Care-Weight-Loss-Supplements/zgbs/drugstore/2826476031#' 

page <- read_html(url) 

numPages <- page %>% 
    html_node('.zg_pagination') %>% 
    html_nodes('li') %>% 
    length 

items <- vector() 
for(i in 1:numPages){ 
    url <- paste0(url, i) 
    page <- read_html(url) 

    item <- page %>% 
    html_nodes(xpath = '//*[@id="zg_centerListWrapper"]/div/div[2]/div/a/div[2]') %>% 
    html_text(trim = TRUE) 

    items <- append(items, item) 
}

主要區別：

我有一個循環，而不是功能性的方法
修改了XPath的參數，以獲得項目去名稱 - 您可以輕鬆擴展以獲得價格，明星等。

來源

2017-11-10 02:20:29 IanK

使用R刮亞馬遜網頁

回答

相關問題