python selenium webscraping - 無法獲取數據

我遇到了一個網站，我希望從中獲取一些數據。但是這個網站似乎對我有限的Python知識是無法彌補的。當使用driver.find_element_by_xpath時，我通常會遇到超時異常。python selenium webscraping - 無法獲取數據

使用我在下面提供的代碼，我希望點擊第一個結果並進入一個新頁面。在新的頁面上，我想抓取產品標題和包裝大小。但是不管我怎麼嘗試，我甚至無法讓Python爲我點擊正確的東西。更不用說刮數據了。有人可以幫忙嗎？

我的期望的輸出是：

三（三苯基膦）銠（I）氯化物，98％ 1 GR 87.60
5 GR 367.50

這些是我有碼到目前爲止：

from selenium import webdriver 
from selenium.common.exceptions import TimeoutException 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

url = "http://www.acros.com/" 
cas = "14694-95-2" # need to select for the appropriate one 

driver = webdriver.Firefox() 
driver.get(url) 

country = driver.find_element_by_name("ddlLand") 
for option in country.find_elements_by_tag_name("option"): 
    if option.text == "United States": 
     option.click() 
driver.find_element_by_css_selector("input[type = submit]").click() 

choice = driver.find_element_by_name("_ctl1:DesktopThreePanes1:ThreePanes:_ctl4:ddlType") 
for option in choice.find_elements_by_tag_name("option"): 
    if option.text == "CAS registry number": 
     option.click() 

inputElement = driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_tbSearchString") 
inputElement.send_keys(cas) 
driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_btnGo").click()

來源

2014-07-21 user3788728

從長遠來看，我會寫一個for循環，它接受一堆cas＃並輸出結果，所以我希望我的代碼是一般的以允許自動化。 – user3788728

一旦你導航到不同的頁面（通常在調用'click '方法），你在內存中的所有元素都可能是無效的（AKA「陳舊」）。我建議你在你的代碼中的每個'for'循環中調用這個方法後添加'break'。 –

您提供的代碼的作品對我來說很好，因爲它將Firefox的實例指向顯示搜索結果的http://www.acros.com/DesktopModules/Acros_Search_Results/Acros_Search_Results.aspx?search_type=CAS&SearchString=14694-95-2。

如果找到該網頁上的IFRAME元素：

< IFRAME ID = 「searchAllFrame」 ALLOWTRANSPARENCY = 「」背景顏色= 「透明」 FRAMEBORDER = 「0」寬度= 「1577」 HEIGHT =「3000 「scrolling =」auto「src =」http://newsearch.chemexper.com/misc/hosted/acrosPlugin/center.shtml?query=14694-95-2 & searchType = cas & currency = & country = NULL & language = EN & forGroupNames = AcrosOrganics，FisherSci，MaybridgeBB，BioReagents公司，FisherLCMS &服務器= www.acros.com」 > </iframe中>

，並使用driver.switch_to.frame切換到該幀，然後我想你想應該是從那裏scrapable數據，例如：

driver.switch_to.frame(driver.find_element_by_xpath("//iframe[@id='searchAllFrame']"))

然後，您可以繼續使用驅動程序像往常一樣找該iframe中的元素。（我認爲switch_to_frame工作方式類似，但不建議使用。）

（我似乎無法找到一個像樣的鏈接文檔的switch_to，this是不是所有的幫助。

來源

2014-07-21 09:07:06 user3468054

Hi can你詳細說明了我是如何做到這一點的？我從來沒有使用過switch_to，也不太瞭解s巫婆幀。我會做一些研究 – user3788728

非常感謝你的幫助！我從來不知道切換幀。我可以通過調用driver.switch_to_frame來解決問題!!!!! – user3788728

python selenium webscraping - 無法獲取數據

回答

相關問題