2016-03-21 79 views
1

我想用Selenium/Python自動從http://factfinder.census.gov下載數據集。我是Javascript新手,很抱歉,如果這是一個容易解決的問題。我工作的代碼的開始部分,現在,它應該:從python/selenium與javascript可滾動容器交互

  1. 轉到here
  2. 單擊「主題」按鈕
  3. 一旦「主題」點擊和新的頁面加載,點擊「數據集」
  4. 選擇我需要的數據集,理想情況下通過索引(子)表。

我卡在第3步。下面是截圖;似乎我想要訪問div w/id「scrollable_container_topics」,然後通過迭代或索引來獲取其子節點(在這種情況下,我想要最後一個子節點)。我已經嘗試使用script_execute,然後通過id和類名來定位元素,但目前爲止沒有任何工作。我會很感激任何指針。

enter image description here

這裏是我的代碼:

import os 
import re 
import time 
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions 
from selenium.webdriver.support.wait import WebDriverWait 
from selenium.webdriver.support.select import Select 


# A list of all the variables we want to extract; corresponds to "Topics" field on site 
topics = ["B03003", "B05001"] 

# A list of all the states we want to extract data for (currently, strings; is there a numeric code?) 
states = ["New Jersey", "Georgia"] 

# A vector of all the years we want to extract data for [lower, upper) *Note* this != range of years covered by data 
years = range(2009, 2010) 

# Define the class 
class CensusSearch: 

    # Initialize and set attributes of the query 
    def __init__(self, topic, state, year): 

     """ 
     :type topic: str 
     :type state: str 
     :type year: int 
     """ 
     self.topic = topic 
     self.state = state 
     self.year = year 


    def setUp(self): 

     # self.driver = webdriver.Chrome("C:/Python34/Scripts/chromedriver.exe") 
     self.driver = webdriver.Firefox() 

    def extractData(self): 
     driver = self.driver 
     driver.set_page_load_timeout(1000000000000) 
     driver.implicitly_wait(100) 

     # Navigate to site; this url = after you have already chosen "Advanced Search" 
     driver.get("http://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t") 
     driver.implicitly_wait(10) 

     # FIlter by dataset (want the ACS 1, 3, and 5-year estimates) 

     driver.execute_script("document.getElementsByClassName('leftnav_btn')[0].click()") # click the "Topics" button 
     driver.implicitly_wait(20) 

     # This is where I am stuck; I've tried the following: 
     getData = driver.find_element_by_id("ygtvlabelel172") 
     getData.click() 
     driver.implicitly_wait(10) 


     # Filter geographically: select all counties in the United States and Puerto Rico 
     # Click "Geographies" button 
     driver.execute_script("document.getElementsByClassName('leftnav_btn')[1].click()") 
     driver.implicitly_wait(10) 

     drop_down = driver.find_element_by_class_name("popular_summarylevel") 
     select_box = Select(drop_down) 
     select_box.select_by_value("050") 

    # Once "Geography" is clicked, select "County - 050" from the drop-down menu; then select "All US + Puerto Rico" 
    drop_down_counties = driver.find_element_by_id("geoAssistList") 
    select_box_counties = Select(drop_down_counties) 
    select_box_counties.select_by_index(1) 

    # Click the "ADD TO YOUR SELECTIONS" button 
    driver.execute_script("document.getElementsByClassName('button-g')[0].click()") 
    driver.implicitly_wait(10) 

    def tearDown(self): 
     self.driver.quit() 

    def main(self): 
     #print(getattr(self)) 
     print(self.state) 
     print(self.topic) 
     print(self.year) 
     self.setUp() 
     self.extractData() 
     self.tearDown() 


for a in topics: 
    for b in states: 
     for c in years: 
      query = CensusSearch(a, b, c) 
      query.main() 

print("done") 

回答

1

幾件事情需要解決:

  • 你不必使用document.getElement..方法 - 硒有它自己的方法來定位的元素在頁面上
  • 有沒有必要操縱隱式等待(加上,請確保你明白c在這種情況下,你不會立即得到一個時間延遲)或頁面加載超時 - - 阿靈implicitly_wait()不會表現爲time.sleep()只使用Explicit Waits你的頁面

這裏執行行動之前就是點擊一個工作代碼「主題「和」數據集「:

from selenium import webdriver 
from selenium.webdriver import ActionChains 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.wait import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 


driver = webdriver.Firefox() 
driver.get("http://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t") 

wait = WebDriverWait(driver, 10) 
actions = ActionChains(driver) 

# click "Topics" 
topics = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#topic-overlay-btn"))) 
driver.execute_script("arguments[0].click();", topics) 

# click "Dataset" 
dataset = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span[title=Dataset]"))) 
dataset.click()