2014-03-24 43 views
2
<td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" > 
    <div class="inner"> 
    <div class="item"> 
    <div class="view-item view-item-aisd_calendar"> 
    <div class="calendar monthview"> 
     <div class="calendar.4168.field_date.8.0 contents"> 
         <a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a>      <span class="date-display-single">7:00 pm</span>   </div> 
     <div class="cutoff">&nbsp;</div> 
     </div> 
    </div> 
</div> </div> 
</td> 

嗨!我有上面的HTML代碼。我想從上面提取「日期」標籤(2014-04-28)和「a href」標籤(常規董事會會議)。我如何使用Python來做到這一點?這可以使用美麗的湯來完成嗎?任何幫助將不勝感激導航Python中的HTML樹

+0

是的,這可以很容易地用BeautifulSoup完成。我強烈建議閱讀文檔[這裏](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) – TerryA

回答

2

這裏是你如何可以通過BeautifulSoup做到這一點:

from bs4 import BeautifulSoup 


data = """ 
<html> 
    <body> 
     <td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" > 
      <div class="inner"> 
      <div class="item"> 
      <div class="view-item view-item-aisd_calendar"> 
      <div class="calendar monthview"> 
       <div class="calendar.4168.field_date.8.0 contents"> 
           <a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a>      <span class="date-display-single">7:00 pm</span>   </div> 
       <div class="cutoff">&nbsp;</div> 
       </div> 
      </div> 
     </div> </div> 
     </td> 
    </body> 
</html> 
""" 
soup = BeautifulSoup(data) 

td = soup.body.td # or soup.find('td', id='aisd_calendar-2014-04-28-0') 
print td['date'].strip('*') 

link = soup.find('div', {'class': 'contents'}).a 
print link['href'] 

打印:

2014-04-28 
/event/2013/regular-board-meeting 

另外,如果你需要的日期爲python的datetime轉換,你可以使用strptime()

from datetime import datetime 

... 

datetime.strptime(td['date'].strip('*'), '%Y-%m-%d') 

Hope th在幫助。