從一個JavaScript函數生成的鏈接下載PDF文件的URL是:site我無法從使用Python 3.6.0 +硒3.4.3
利用硒與Firefox 47.0.2二進制和Python 3.6.0,從這個頁面我點擊「Pesquisar」按鈕,然後在下一頁我填寫表格中的日期範圍(格式d/m/y)並再次點擊新的「Pesquisar」按鈕,然後我得到一個PDF列表文件,我想下載它們。
當我打印page_source時,可以看到生成的鏈接,但我不明白爲什麼selenium無法找到這些鏈接。
簡化代碼如下:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from datetime import datetime, date, timedelta
from calendar import monthrange
import time
driver = webdriver.Firefox(firefox_profile=profile, firefox_binary=binary, capabilities=capabilities)
driver.maximize_window()
wait = WebDriverWait(driver, 10)
months = range(1, 13)
limits = monthrange(2017, 8)
#num_docs = limites[1]-limites[0]
date_input_begin = '{num:0{width}}'.format(num=limits[0], width=2) + '08' + '2017'
date_input_end = '{num:0{width}}'.format(num=limits[1], width=2) + '08' + '2017'
today = datetime.now().date()
date = today
date = date - timedelta(24)
driver.get("http://dje.trf2.jus.br/DJE/Paginas/Externas/inicial.aspx")
driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrInicial_btnPesquisar").click()
wait.until(EC.presence_of_element_located(
(By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar"]')))
select1 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlAreaJudicial"))
select1.select_by_index(3)
select2 = Select(driver.find_element_by_id("ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_ddlRegistrosPaginas"))
select2.select_by_index(6)
element_date_begin = driver.find_element_by_id(
'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataInicial')
element_date_begin.clear()
element_date_begin.send_keys(date_input_begin)
element_date_end = driver.find_element_by_id(
'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_tbxDataFinal')
element_date_end.clear()
element_date_end.send_keys(date_input_end)
driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').submit()
wait.until(EC.presence_of_element_located((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar')))
wait.until(EC.element_to_be_clickable((By.ID, 'ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar')))
time.sleep(5)
driver.find_element_by_id('ctl00_ContentPlaceHolder_ctrFiltraPesquisaDocumentos_btnFiltrar').click()
wait.until(EC.presence_of_element_located(
(By.XPATH, '//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_lblNomeCaderno"]')))
driver.find_element_by_xpath(
'//*[@id="ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData"]').click()
但是,當我找的ID或XPATH的鏈接,我得到以下錯誤:
File "C:\Users\b2002032064079\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"//*[@id=\"ctl00_ContentPlaceHolder_ctrListaDiarios_udtVisualizaAdmRj_grvCadernos_ct102_lnkData\"]"}
我在刮是個新手我非常感謝任何幫助!謝謝!
@biligung關於xpath,我已將代碼的最後一行更改爲:'driver.find_element_by_xpath('/ html/body/form/div [7]/div/div/div [1]/div [2 ]/div/div [2]/div/table/tbody/tr [2]/td [1]/a')。click()',其中xpath現在是從firebug獲得的,您!關於等待時間,我仍然試圖解決它的迭代下載。 – viniciusdoss
如果它幫助大拇指:D如果你需要等待 - >有一些方法使用javascript <>來檢查頁面是否被重新加載,但我不知道下載。它由OS對話框處理,所以只需手動設置即可。 – biligunb