2017-09-11 58 views
2

從一個JavaScript函數生成的鏈接下載PDF文件的URL是:site我無法從使用Python 3.6.0 +硒3.4.3

利用硒與Firefox 47.0.2二進制和Python 3.6.0,從這個頁面我點擊「Pesquisar」按鈕,然後在下一頁我填寫表格中的日期範圍(格式d/m/y)並再次點擊新的「Pesquisar」按鈕,然後我得到一個PDF列表文件,我想下載它們。

當我打印page_source時,可以看到生成的鏈接,但我不明白爲什麼selenium無法找到這些鏈接。

簡化代碼如下:

from selenium import webdriver 
from selenium.webdriver.support.ui import Select 
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.common.by import By 
from datetime import datetime, date, timedelta 
from calendar import monthrange 
import time 


driver = webdriver.Firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities) 
driver.maximize_window() 
wait = WebDriverWait(driver, 10) 

months = range(1, 13) 
limits = monthrange(2017, 8) 

#num_docs = limites[1]-limites[0] 

date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017' 
date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017' 

today = datetime.now().date() 
date = today 

date = date - timedelta(24) 

driver.get("http://dje.trf2.jus.br/DJE/Paginas/Externas/inicial.aspx") 

driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrInicial_btnPesquisar").click() 

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar"]'))) 

select1 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlAreaJudicial")) 
select1.select_by_index(3) 

select2 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlRegistrosPaginas")) 
select2.select_by_index(6) 

element_date_begin = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataInicial') 
element_date_begin.clear() 
element_date_begin.send_keys(date_input_begin) 

element_date_end = driver.find_element_by_id(
    'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataFinal') 
element_date_end.clear() 
element_date_end.send_keys(date_input_end) 

driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').submit() 

wait.until(EC.presence_of_element_located((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar'))) 
wait.until(EC.element_to_be_clickable((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar'))) 

time.sleep(5) 
driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').click() 

wait.until(EC.presence_of_element_located(
    (By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_lblNomeCaderno"]'))) 

driver.find_element_by_xpath(
    '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData"]').click() 

但是,當我找的ID或XPATH的鏈接,我得到以下錯誤:

File "C:\Users\b2002032064079\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//*[@id=\"ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData\"]"}

我在刮是個新手我非常感謝任何幫助!謝謝!

回答

1

首先:您正在使用哪種瀏覽器? 2:您的網站速度很慢。也許嘗試給予更多的等待時間。 3:xpath是否正確?我認爲問題是XPATH 嘗試使用chrome上的XPath helper來檢查。

+0

@biligung關於xpath,我已將代碼的最後一行更改爲:'driver.find_element_by_xpath('/ html/body/form/div [7]/div/div/div [1]/div [2 ]/div/div [2]/div/table/tbody/tr [2]/td [1]/a')。click()',其中xpath現在是從firebug獲得的,您!關於等待時間,我仍然試圖解決它的迭代下載。 – viniciusdoss

+0

如果它幫助大拇指:D如果你需要等待 - >有一些方法使用javascript <>來檢查頁面是否被重新加載,但我不知道下載。它由OS對話框處理,所以只需手動設置即可。 – biligunb