用於data_links鏈路： driver.get（鏈接）的Python /硒webscraping

review_dict = {} 
# get the size of company 
size = driver.find_element_by_xpath('//[@id="EmpBasicInfo"]//span')

#location = ???也需要獲得這部分。

我的問題：

我想刮一個網站。我正在使用selenium/python從span中刪除「501到1000員工」和「Biotech & Pharmaceuticals」，但我無法使用xpath從網站中提取文本元素。我嘗試了getText，獲取屬性的所有內容。請幫忙！

這是每次迭代的輸出：我沒有得到文本值。

預先感謝您！

來源

2017-07-29 Fun-zin

1.你期望得到什麼文字？ 2.請將代碼發佈爲文字而不是圖片，它可以幫助每個想要幫助的人。 –

感謝您的及時迴應。我試圖從範圍內獲得「501到1000名員工」和「生物技術與製藥」 –

如果你知道你想要得到'尺寸'標籤後面的內容，那麼使用 bs4的'find（）' –

看來你想要的，而不是用一些元素交互只有文字，一個解決方案是使用BeautifulSoup解析HTML的你，與selenium獲得由JavaScript內置的代碼，你應該先把HTML內容與html = driver.page_source ，然後你可以這樣做：

html =''' 
<div id="CompanyContainer"> 
<div id="EmpBasicInfo"> 
<div class=""> 
<div class="infoEntity"></div> 
<div class="infoEntity"> 
<label>Industry</label> 
<span class="value">Woodcliff</span> 
</div> 
<div class="infoEntity"> 
<label>Size</label> 
<span class="value">501 to 1000 employees</span> 
</div> 
</div> 
</div> 
</div> 
''' # Just a sample, since I don't have the actual page to interact with. 
soup = BeautifulSoup(html, 'html.parser') 
>>> soup.find("div", {"id":"EmpBasicInfo"}).findAll("div", {"class":"infoEntity"})[2].find("span").text 
'501 to 1000 employees'

或者，當然了，避免特定的索引和尋找<label>Size</label>，它應該是更具可讀性：

>>> [a.span.text for a in soup.findAll("div", {"class":"infoEntity"}) if (a.label and a.label.text == 'Size')] 
['501 to 1000 employees']

使用selenium你可以做：

>>> driver.find_element_by_xpath("//*[@id='EmpBasicInfo']/div[1]/div/div[3]/span").text 
'501 to 1000 employees'

來源

2017-07-29 23:32:41

我想爲整個項目使用硒而不是使用湯。該網站有一些沉重的ajax屬性，我需要從該部分提取大部分信息。謝謝你的幫助！ –

@ Fun-zin請檢查我的編輯！ –

非常感謝您的及時回覆和耐心。我用你的硒版本，它的工作。 –

的Python /硒webscraping

我的問題：

回答

相關問題