如何使用R來抓取點擊信息？

我試圖從這個網站刮電話號碼：http://olx.pl/oferta/pokoj-1-os-bielany-encyklopedyczna-CID3-IDdX6wf.html#c1c0e14c53。電話號碼可以rvest封裝選擇.\'id_raw\'\::nth-child(1) span+ div strong刮（由selectorGadget建議。如何使用R來抓取點擊信息？

的問題是可以得到的信息被點擊的面具之後。所以，不知何故，我要開一個會議，提供了一個點擊，然後刮信息。

編輯通過它不是一個鏈接恕我直言的方式。看一看來源。我有一個問題，因爲我是個普通[R用戶，而不是一個JavaScript程序設計師。

來源

2016-02-13 Marcin Kosiński

您是否嘗試過RSelenium？ – Jota

你可以抓住它告訴onclick處理程序做什麼，只是直接將數據嵌入<li>標籤中的數據：

library(httr) 
library(rvest) 
library(purrr) 
library(stringr) 

URL <- "http://olx.pl/oferta/pokoj-1-os-bielany-encyklopedyczna-CID3-IDdX6wf.html#c1c0e14c53" 

pg <- read_html(URL) 

html_nodes(pg, "li.rel") %>%  # get the 'special' <li> tags 
    html_attrs() %>%     # extract all the attrs (they're non-standard) 
    flatten_chr() %>%    # list to character vector 
    keep(~grepl("rel \\{", .x)) %>% # only want ones with 'hidden' secret data 
    str_extract("(\\{.*\\})") %>% # only get the data 
    unique() %>%      # there are duplicates 
    map_df(function(x) { 

    path <- str_match(x, "'path':'([[:alnum:]]+)'")[,2]     # extract out the path 
    id <- str_match(x, "'id':'([[:alnum:]]+)'")[,2]      # extract out the id 

    ajax <- sprintf("http://olx.pl/ajax/misc/contact/%s/%s/", path, id) # make the AJAX/XHR URL 
    value <- content(GET(ajax))$value         # get the data 

    data.frame(path=path, id=id, value=value, stringsAsFactors=FALSE) # make a data frame 

    }) 

## Source: local data frame [3 x 3] 
## 
##   path id  value 
##   (chr) (chr)  (chr) 
## 1  phone dX6wf 503 155 744 
## 2  skype dX6wf e.bobruk 
## 3 communicator dX6wf  7686136

做完這一切，我很失望網站沒有更好的服務/使用條款。很明顯，他們真的不希望你刮這些數據。

來源

2016-02-14 14:06:29 hrbrmstr

不錯，有一個解決方案，而無需使用外部軟件/程序。你是否遇到過需要**使用諸如「硒」之類的東西的情況，或者你通常可以在「R」中做所有事情？ – tospig

我嘗試不使用它，因爲RSelenium pkg cld中的成語使用「Hadleyverse」改造IMO。但是肯定有時候這是必要的。 – hrbrmstr

下面是使用RSelenium，（RSelenium introduction）和phantomjs的解決方案。

但是，我不確定這是多麼有用，因爲它在我的機器上運行速度非常慢，而且我不是phantomjs或硒專家，所以我不知道在哪裏可以提高速度，所以東西看看...

編輯

我又嘗試這樣做，它似乎是確定的速度。

library(RSelenium) 
library(rvest) 

## Terminal command to start selenium (on ubuntu) 
## cd ~/selenium && java -jar selenium-server-standalone-2.48.2.jar 
url <- "http://olx.pl/oferta/pokoj-1-os-bielany-encyklopedyczna-CID3-IDdX6wf.html#c1c0e14c53" 

RSelenium::startServer() 
remDr <- remoteDriver(browserName = "phantomjs") 

remDr$open() 
remDr$navigate(url) 

# css <- ".cpointer:nth-child(1)" ## couldn't get this to work 
xp <- "//div[@class='contactbox-indent rel brkword']" 
webElem <- remDr$findElement(using = 'xpath', xp) 

# webElem <- remDr$findElement(using = 'css selector', css) 
webElem$clickElement() 

## the page source now includes the clicked element 
page_source <- remDr$getPageSource()[[1]] 
pos <- regexpr('class=\\"xx-large', page_source) 

## you could write a more intelligent regex, but this works for now 
phone_number <- substr(page_source, pos + 11, pos + 21) 
phone_number 
# "503 155 744" 

# remDr$close() 
# remDr$closeServer()

來源

2016-02-14 00:07:44 tospig

如何使用R來抓取點擊信息？

回答

相關問題