2017-03-27 44 views
1

我基本上想抓2016年2月 - 至今<span class="visually-hidden">下,但我看不到它。這裏的HTML代碼處:如何使用硒和Python刮取嵌套數據>

<div class="pv-entity__summary-info"> 

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3> 

<h4> 
    <span class="visually-hidden">Company Name</span> 
    <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span> 
</h4> 


    <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%"> 
     <span class="visually-hidden">Dates Employed</span> 
     <span>Feb 2016 – Present</span> 
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0"> 
     <span class="visually-hidden">Employment Duration</span> 
     <span class="pv-entity__bullet-item">1 yr 2 mos</span> 
     </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block"> 
     <span class="visually-hidden">Location</span> 
     <span class="pv-entity__bullet-item">London, United Kingdom</span> 
    </h4></div> 

</div> 

,這裏是我一直在此刻與硒在我的代碼做:

 date= browser.find_element_by_xpath('.//div[@class = "pv-entity__duration de Sans-15px-black-55% ml0"]').text 
     print date 

但這沒有給出結果。我會怎樣去拉日期?

+0

哪些文本是你想提取? '2016年2月 - 現在'一個或'1年2個月'? – Mangohero1

+0

更新了原始信息。 2016年2月 - 現在是我試圖刮 – semiflex

回答

2

沒有divclass="pv-entity__duration de Sans-15px-black-55% ml0",但h4。如果你想獲得的div文本,然後嘗試:

date= browser.find_element_by_xpath('.//div[@class = "pv-entity__position-info detail-facet m0"]').text 
print date 

如果你想獲得"Feb 2016 - Present",然後嘗試

date= browser.find_element_by_xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span[2]').text 
print date 
0

你可以重寫你的XPath代碼是這樣的:

# -*- coding: utf-8 -*- 
from lxml import html 
import unicodedata 


html_str = """ 
<div class="pv-entity__summary-info"> 

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3> 

<h4> 
    <span class="visually-hidden">Company Name</span> 
    <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span> 
</h4> 


    <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%"> 
     <span class="visually-hidden">Dates Employed</span> 
     <span>Feb 2016 – Present</span> 
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0"> 
     <span class="visually-hidden">Employment Duration</span> 
     <span class="pv-entity__bullet-item">1 yr 2 mos</span> 
     </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block"> 
     <span class="visually-hidden">Location</span> 
     <span class="pv-entity__bullet-item">London, United Kingdom</span> 
    </h4></div> 

</div> 
""" 

root = html.fromstring(html_str) 
# For fetching Feb 2016 â Present : 
txt = root.xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span/text()')[1] 
# For fetching 1 yr 2 mos : 
txt1 = root.xpath('//h4[@class="pv-entity__duration de Sans-15px-black-55% ml0"]/span/text()')[1] 
print txt 
print txt1 

這將導致:

Feb 2016 â Present 
1 yr 2 mos