2016-03-24 71 views
0

我想湊的,每個國家都有使用該鏈接的所有艾滋病毒/艾滋病有關的非政府組織表:https://www.unodc.org/ngo/showExtendedSearch.doRSelenium:遍歷所有值在投寄箱

我能夠導航朝網址並選擇「HIV /艾滋病'單選按鈕。但是現在我還需要爲Dropbox的「地區」和「國家」提取所有值,以便我可以在循環內使用它們,以便按順序對每個國家/地區的表格進行webscrape。我如何收集這兩個Dropbox的值?到目前爲止我的代碼低於:

#load library 
library(RSelenium) 

#Specify remote driver 
remDr <- remoteDriver(browserName='firefox') 

#Initialise session 
remDr$open() 

#navigate to advanced search page 

url <- "https://www.unodc.org/ngo/showExtendedSearch.do" 
remDr$navigate(url) 

#Click 'HIV/AIDS' filter 
webElem <- remDr$findElement(using = 'css', 
         value = '#applicationArea > form > table > tbody > tr > td > table:nth-child(7) > tbody > tr:nth-child(2) > td > table > tbody > tr > td:nth-child(2) > table > tbody > tr:nth-child(3) > td:nth-child(4) > input[type="checkbox"]') 

webElem$clickElement() 

回答

0

使用Firebug或開發工具,以確定在下拉菜單元素的XPath然後使用getElementText檢索值:

region_element <- remDr$findElement('xpath', '//*[@id="applicationArea"]/form/table/tbody/tr/td/table[2]/tbody/tr[2]/td/table/tbody/tr[1]/td[2]/select') 
regions <- strsplit(region_element$getElementText()[[1]], "\n") 

country_element <- remDr$findElement('xpath', '//*[@id="applicationArea"]/form/table/tbody/tr/td/table[2]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]/select') 
countries <- strsplit(country_element$getElementText()[[1]], "\n") 

R> print(regions[[1]]) 
[1] "Middle East and Northern Africa" "Eastern Africa"     
[3] "Western Africa"     "Central and Southern Africa"  
[5] "Northern America"     "Central America and the Caribbean" 
[7] "Latin America"      "Central and Western Asia"   
[9] "Southern and Eastern Asia"   "Europe"       
[11] "Oceania"       
R> print(head(countries[[1]])) 
[1] "Afghanistan" "Albania"  "Algeria"  "American Samoa" "Andorra"  
[6] "Angola"