BeautifulSoup越來越HREF

105

<a href="some_url">next</a> 
<span class="class">...</span>

從這個我想提取的HREF，"some_url"

我能做到這一點，如果我只有一個標籤，但這裏有兩個標籤。我也可以得到文本'next'但這不是我想要的。

此外，是否有API的例子很好的描述。我正在使用the standard documentation，但我正在尋找更有組織的東西。

來源

2011-04-28 dkgirl

請張貼代碼示例，以顯示你如何試圖做到這一點 – seb 2011-04-28 08:33:04

好吧，我弄明白了：讓我感到困惑的是我使用django（html）來看它，它實際上在提交之前刪除了href：湯。 find（'a'）變成'n'分機' – dkgirl 2011-04-28 08:38:20

157

您可以通過以下方式找到一個具有href屬性每a元素使用find_all，並打印出每個之一：

from BeautifulSoup import BeautifulSoup 

html = '''<a href="some_url">next</a> 
<span class="class"><a href="another_url">later</a></span>''' 

soup = BeautifulSoup(html) 

for a in soup.find_all('a', href=True): 
    print "Found the URL:", a['href']

輸出將是：

Found the URL: some_url 
Found the URL: another_url

注意，如果您使用的是舊版本的BeautifulSoup（版本4之前），此方法的名稱爲findAll。在版本4中，BeautifulSoup的方法名稱爲were changed to be PEP 8 compliant，因此您應該使用find_all代替。

如果你想所有標籤與href，則可以省略name參數：

href_tags = soup.find_all(href=True)

來源

2011-04-28 08:38:59

你可以得到與類「class =」class「」 – yoshiserry 2014-05-19 01:24:34

@yoshiserry soup.find（'a'，{'class'：'class'}）''['href'] – rleelr 2017-01-08 16:15:32

你如何減弱誤報和不需要的結果（即'javascript：void（0）'，'/ en/support/index.html'，'＃smp-navigationList'）？ – user3155368 2018-02-12 10:28:29

BeautifulSoup越來越HREF

回答

相關問題