2016-08-15 35 views
4

我該如何去並行運行RSelenium並行運行RSelenium

以下是並聯

library(RSelenium) 
library(rvest) 
library(magrittr) 
library(foreach) 
library(doParallel) 

URLsPar <- c("http://www.example.com/", "http://s5.tinypic.com/n392s6_th.jpg", "http://s5.tinypic.com/jl1jex_th.jpg", 
     "http://s6.tinypic.com/16abj1s_th.jpg", "http://s6.tinypic.com/2ymvpqa_th.jpg") 

(detectCores() - 1) %>% makeCluster %>% registerDoParallel 

ws <- foreach(x = 1:length(URLsPar), .packages = c("rvest", "magrittr", "RSelenium")) %dopar% { 
     URLsPar[x] %>% read_html %>% as("character")} 

stopImplicitCluster() 
+1

打開使用'的open'方法爲每個實例單獨的瀏覽器'remoteDriver'類。在你的工作流程中,「seleniumPipes」可能是合適的https://github.com/johndharrison/seleniumPipes – jdharrison

+0

我有幾千個網址,可以說我有3個內核'registerDoParallel',我需要'打開'3 「foreach」之前的實例?我不知道'seleniumPipes'! thnx –

回答

2

使用rvest在集羣中的每個節點開始remoteDriver一個例子:

library(RSelenium) 
library(rvest) 
library(magrittr) 
library(foreach) 
library(doParallel) 

URLsPar <- c("http://www.bbc.com/", "http://www.cnn.com", "http://www.google.com", 
      "http://www.yahoo.com", "http://www.twitter.com") 
appHTML <- c() 
# start a Selenium Server 
selServ <- startServer() 

(cl <- (detectCores() - 1) %>% makeCluster) %>% registerDoParallel 
# open a remoteDriver for each node on the cluster 
clusterEvalQ(cl, { 
    library(RSelenium) 
    remDr <- remoteDriver() 
    remDr$open() 
}) 
myTitles <- c() 
ws <- foreach(x = 1:length(URLsPar), .packages = c("rvest", "magrittr", "RSelenium")) %dopar% { 
    remDr$navigate(URLsPar[x]) 
    remDr$getTitle()[[1]] 
} 

# close browser on each node 
clusterEvalQ(cl, { 
    remDr$close() 
}) 

stopImplicitCluster() 
# stop Selenium Server 
selServ$stop() 

> ws 
[[1]] 
[1] "BBC - Homepage" 

[[2]] 
[1] "CNN - Breaking News, U.S., World, Weather, Entertainment & Video News" 

[[3]] 
[1] "Google" 

[[4]] 
[1] "Yahoo" 

[[5]] 
[1] "Welcome to Twitter - Login or Sign up" 
+0

非常感謝,再次感謝! –

+0

樂於幫助.. – jdharrison

+0

@jdharrsion:是否可以在單個Firefox實例上使用Parallel打開多個選項卡?我知道在所有並行實例中環境是不同的,但仍然想知道它是否可行。 – Bharath