2015-10-14 42 views
0
HTML標籤內的多個值

我刮出一個HTML頁面中包含多個代碼塊如下:選擇與BeautifulSoup

<div data-pnref="all" class="clearfix _5qo4"> 
<a data-hovercard="/ajax/hovercard/user.php?id=671948073& 
amp;extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D" ... /> 

我想檢索data-hovercard的價值,特別是ID的URL :「671948073」。

我嘗試了findAll並在BeautifulSoup模塊中進行了選擇,但目前爲止失敗。

+0

是的,但我檢索整個塊,然後我不能提取id – stochazesthai

回答

2

找到<div>然後找到<a>

html = '<div data-pnref="all" class="clearfix _5qo4"><a data-hovercard="/ajax/hovercard/user.php?id=671948073&amp;extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D"/></div>' 
soup = BeautifulSoup(html) 

div = soup.find('div') 
anchor = div.find('a') 

data_hovercard = anchor['data-hovercard'] 

print data_hovercard 
#/ajax/hovercard/user.php?id=671948073&extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D 

要獲得id的值,使用urlparse

import urlparse 

parsed = urlparse.urlparse(data_hovercard) 
parsed_dict = urlparse.parse_qs(parsed.query) 
hovercard_id = parsed_dict['id'] 

print hovercard_id 
#['671948073']