2016-06-27 150 views
0

我想刮一個表像this [R刮表(點擊搜索,然後你會得到合作伙伴的一個表)。我想要刮掉夥伴的名字。問題是我不知道什麼樣的桌子,也不知道怎麼刮。我正在使用RSelenium包。如果可以使用rvest來完成,那麼它會很有幫助。使用RSelenium

那麼這是一張什麼樣的桌子,是否可以用RSeleniumrvest刮掉它,如果是這樣,怎麼辦?

ul="http://partnerlocator.symantec.com" 
remDr$navigate(ul) 
webElem<-remDr$findElement(using = "class", value = "button") 
webElem$clickElement() 
Sys.sleep(10) 
webElem<-remDr$findElement(using = "class", value = "results") 
unlist(webElem$getElementText()) 

但我得到這樣一個非常複雜的文本輸出 -

CDW\nCDW\n200 North Milwaukee Avenue\nVernon Hills ,Illinois ,60061\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCore Security - Platinum\nThreat Protection - Platinum\nCyber Security Services - Platinum\nInformation Protection - Platinum\nDLT Solutions\nDLT Solutions\n2411 Dulles Corner Park Suite 800\nHerndon ,Virginia ,20171\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nInformation Protection - Platinum\nThreat Protection - Platinum\nCore Security - Platinum\nCyber Security Services - Platinum\nInsight Direct USA\nInsight Direct USA\n3480 Lotus Drive\nPlano ,Texas ,75075\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCyber Security Services - Platinum\nCore Security - Platinum\nThreat Prot......... 
+0

這裏提到類似的問題:http://stackoverflow.com/questions/29953394/how-to-find-a-subset-of-cells-in-an-html-table-using-r-or-jquery – Mohammad

回答

0

這看起來像一個非常基本的HTML表合併爲一條線路可擴展爲這樣:

library(RSelenium) 

checkForServer() 
ul="http://partnerlocator.symantec.com" 
startServer() 
remDr <- remoteDriver() 
remDr$open() 
remDr$navigate(ul) 
webElem<-remDr$findElement(using = "class", value = "button") 
webElem$clickElement() 
Sys.sleep(10) 
webElem<-remDr$findElement(using = "class", value = "results") 
results <- webElem$getElementText() 
results_chr <- unlist(strsplit(results[[1]], "\n")) 

head(results_chr) 
[1] "CDW"       "CDW"       "200 North Milwaukee Avenue" 
[4] "Vernon Hills ,Illinois ,60061" "United States"     "Distance: 0 mi" 

您可以使用rvest從該結果頁面的HTML表格返回一個更清晰的結果,但我無法這樣做。