美麗的湯 - 如何讓HREF

我似乎無法能夠提取的HREF（只有一個<strong>Website:</strong>在頁面上）從HTML以下湯：美麗的湯 - 如何讓HREF

<div id='id_Website'> 
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a> 
</div></div><div>

這是我思想應該工作

href = soup.find("strong" ,text=re.compile(r'Website')).next["href"]

來源

2011-09-12 howtodothis

.next在這種情況下是包含<strong>標籤和<a>標籤之間的空白一個NavigableString。而且，text=屬性用於匹配NavigableString，而不是元素。

下你想要做什麼，我想：

import re 
from BeautifulSoup import BeautifulSoup 

html = '''<div id='id_Website'> 
<strong>Website:</strong> 
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a> 
</div></div><div>''' 

soup = BeautifulSoup(html) 

for t in soup.findAll(text=re.compile(r'Website:')): 
    # Find the parent of the NavigableString, and see 
    # whether that's a <strong>: 
    s = t.parent 
    if s.name == 'strong': 
     print s.nextSibling.nextSibling['href']

...但是，這是不是很強勁。如果封閉的div有一個可預測的ID，那麼最好找到它，然後在其中找到第一個元素<a>。

來源

2011-09-12 13:39:31

這就是我想要的。謝謝。如何通過ID進行搜索以獲取下一個href值？ – howtodothis

你可以使用像'soup.findAll（'div'，id = re.compile（'Website $'））''來獲得所有'div's考慮 - 沒有看到其他例子，它不清楚你會怎麼做但是，挑選出來。 –

美麗的湯 - 如何讓HREF

回答

相關問題