如何獲得python beautifulsoup中的下一頁鏈接？

我有此鏈接：如何獲得python beautifulsoup中的下一頁鏈接？

http://www.brothersoft.com/windows/categories.html

我想獲得的鏈接DIV中的項目。例子：

http://www.brothersoft.com/windows/mp3_audio/midi_tools/

我曾嘗試這樣的代碼：

import urllib 
from bs4 import BeautifulSoup 

url = 'http://www.brothersoft.com/windows/categories.html' 

pageHtml = urllib.urlopen(url).read() 

soup = BeautifulSoup(pageHtml) 

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})] 

for i in sAll: 
    print "http://www.brothersoft.com"+i['href']

但我只得到輸出：

http://www.brothersoft.com/windows/mp3_audio/

我怎樣才能得到我需要的輸出？

來源

2013-08-22 wan mohd payed

完美的工作，有什麼問題？ – dorvak

輸出應該是http://www.brothersoft.com/windows/mp3_audio/midi_tools/ –

Url http://www.brothersoft.com/windows/mp3_audio/midi_tools/不在標記<div class='brLeft'>中，所以如果輸出是http://www.brothersoft.com/windows/mp3_audio/，那是正確的。

如果你想獲得你想要的網址，改變

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]

到

sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]

UPDATE：

一個例子來獲得內部消息 'midi_tools'

import urllib 
from bs4 import BeautifulSoup 

url = 'http://www.brothersoft.com/windows/categories.html' 
pageHtml = urllib.urlopen(url).read() 
soup = BeautifulSoup(pageHtml) 
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})] 
for i in sAll: 
    suburl = "http://www.brothersoft.com"+i['href'] #which is a url like 'midi_tools' 

    content = urllib.urlopen(suburl).read() 
    anosoup = BeautifulSoup(content) 
    ablock = anosoup.find('table',{'id':'courseTab'}) 
    for atr in ablock.findAll('tr',{'class':'border_bot '}): 
     print atr.find('dt').a.string  #name 
     print "http://www.brothersoft.com" + atr.find('a',{'class':'tabDownload'})['href'] #link

來源

2013-08-22 10:42:16

如果我想要獲取midi_tools中的應用程序名稱和鏈接？ –

@wan mohd付了款，這與您所做的相似，獲取midi_tools頁面的內容，並找出該信息所在的標籤，然後使用'BeautifulSoup'獲取信息。 –

@ Davd.Zheng我需要使用「加入」還是什麼？ –

如何獲得python beautifulsoup中的下一頁鏈接？

回答

相關問題