Beautifulsoup4不返回頁面上的所有鏈接

我正在使用Python 3.5的Web爬蟲。使用請求和Beautifulsoup4。我正在嘗試獲取論壇第一頁上所有主題的鏈接。並將它們添加到列表中。Beautifulsoup4不返回頁面上的所有鏈接

我有2個問題：

1）不知道如何使用beautifulsoup獲得鏈接，我無法獲得在鏈接本身，只是在div 2）看來，Beautifulsoup將返回只有少數主題，而不是全部。

def getTopics(): 
topics = [] 
url = 'http://forum.jogos.uol.com.br/pc_f_40' 
source_code = requests.get(url) 
plain_text = source_code.text 
soup = BeautifulSoup(plain_text, 'html.parser') 

for link in soup.select('[class="topicos"]'): 
    a = link.find_all('a href') 
    print (a)

getTopics（）

來源

2015-10-28 Legos

首先，它實際上遍歷呈現網頁上的所有38個主題。

實際的問題在於如何爲每個主題提取鏈接 - link.find_all('a href')將找不到任何東西，因爲頁面上沒有a href元素。將它替換爲link.select('a[href]') - 它會發現你所有的a元素具有href屬性。

好吧，你甚至可以用一個列表理解解決這個問題：

topics = [a["href"] for a in soup.select('.topicos a[href]')]

來源

2015-10-28 02:03:13 alecxe

Beautifulsoup4不返回頁面上的所有鏈接

回答

相關問題