BeautifulSoup無法獲得正確的鏈接

我想解析一些HTML，我想提取匹配特定模式的鏈接。我使用正則表達式使用find方法，但它沒有爲我提供正確的鏈接。這是我的片段。有人能告訴我我做錯了什麼嗎？BeautifulSoup無法獲得正確的鏈接

from BeautifulSoup import BeautifulSoup 
import re 

html = """ 
<div class="entry"> 
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a> 
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div> 
""" 

soup = BeautifulSoup(html) 
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

我應該得到第二個鏈接，但BS總是返回第一個鏈接。第一個鏈接的href甚至不匹配我的正則表達式，爲什麼它會返回？

謝謝。

來源

2010-07-23 Mridang Agarwalla

我已經從BeautifulSoup導入BeautifulSoup更正。它不應該工作，如果沒有。然後它返回第三個鏈接。我認爲它工作正常。 – luc 2010-07-23 08:17:04

由於某種原因，它似乎不起作用。它總是給出第一個鏈接 - 'http：// www.rottentomatoes.com/m/diary_of_a_wimpy_kid /' – 2010-07-23 08:42:55

它也適用於我（BS v3.1.0.1）。，我得到第三個鏈接。你的版本是什麼？ – tokland 2010-07-23 13:17:47

find只返回第一<a>標籤。你想要findAll。

來源

2010-07-23 09:03:29 katrielalex

無法回答你的問題，但無論如何，你的（最初）發佈的代碼有一個輸入錯字。更改

import BeautifulSoup

到

from BeautifulSoup import BeautifulSoup

然後，你的輸出（使用beautifulsoup版本3.1.0.1）將是：

http://www.imdb.com/title/tt1196141/

來源

2010-07-23 08:13:57 miku

我的不好。在我的電腦上測試時，我的BS位於不同的位置，當我在這裏複製粘貼編碼時，我匆匆修改了「輸入」，因此錯誤。我將進行編輯。問題仍然存在。它沒有給我正確的鏈接。 – 2010-07-23 08:41:17

BeautifulSoup無法獲得正確的鏈接

回答

相關問題