2017-08-19 74 views
0

我試圖用Python和selenium下載這張website的PDF幻燈片,但我認爲載入幻燈片的鏈接僅在加載腳本後出現。我試圖等待JavaScript加載,但它仍然沒有找到任何東西。有任何想法嗎?Selenium沒有得到PDF鏈接的HTML

import os, sys, time, random 
import requests 
from selenium import webdriver 
from bs4 import BeautifulSoup 

url = 'https://mila.umontreal.ca/en/cours/deep-learning-summer-school-2017/slides' 

browser = webdriver.Chrome() 
browser.get(url) 
browser.implicitly_wait(3) 
html = browser.page_source 
links = browser.find_elements_by_class_name('flip-entry') 
print(links) 
browser.quit() 
+0

乍一看:你爲什麼要設置'HTML = browser.page_source',而不是使用'html'? – JacobIRR

回答

0

原因是在主頁面上沒有鏈接。您正在獲取IFrame中的鏈接。此iframe指向https://drive.google.com/embeddedfolderview?hl=fr&id=0ByUKRdiCDK7-c0k1TWlLM1U1RXc#list

IFrame

您可以直接在瀏覽你的代碼,而不是主要頁面的URL。或者你也可以切換到幀

browser.switch_to_frame(browser.find_element_by_class_name("iframe-class")) 
links = browser.find_elements_by_css_selector('.flip-entry a') 

for link in links: 
    print(link.get_attribute("href")) 
0
from bs4 import BeautifulSoup 
from selenium import webdriver 

url = 'https://mila.umontreal.ca/en/cours/deep-learning-summer-school-2017/slides' 
browser = webdriver.Chrome() 
browser.get(url) 
browser.switch_to_frame(browser.find_element_by_class_name('iframe-class')) 
links = browser.find_elements_by_class_name('.flip-entry a') 
for link in links: 
    print(link.get_attribute("href")) 
browser.quit()