如何獲得只有MP3鏈接這是我的代碼:使用beautifulsoup和python
from bs4 import BeautifulSoup
import urllib.request
import re
url = urllib.request.urlopen("http://www.djmaza.info/Abhi-Toh-Party-Khubsoorat-Full-Song-MP3-2014-Singles.html")
content = url.read()
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True):
if re.findall('http',a['href']):
print ("URL:", a['href'])
輸出這段代碼:
URL: http://twitter.com/mp3khan
URL: http://www.facebook.com/pages/MP3KhanCom-Music-Updates/233163530138863
URL: https://plus.google.com/114136514767143493258/posts
URL: http://www.djhungama.com
URL: http://www.djhungama.com
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://www.htmlcommentbox.com
URL: http://www.djmaza.com
URL: http://www.djhungama.com
我只需要MP3播放鏈接。
那麼,我應該如何重寫代碼?
謝謝
非常感謝你...... D – 2014-08-29 09:22:05
@MuneebK不客氣。另一方面,當你使用'bs4'時 - 你可能想使用'.find_all'而不是'findAll',因爲後者是BS3風格,並且爲了向後兼容而保留,但可能在某些時候被刪除 - 所以最好養成使用'something_something'函數而不是'somethingSomething'函數的習慣。 – 2014-08-29 09:24:59
我需要將這些鏈接存儲在數組上進行下載。我怎樣才能做到這一點 ? – 2014-08-29 09:36:46