2017-05-05 64 views
0

音頻源鏈接我寫一個腳本來從一個網站的音頻源鏈接。通過抓取主頁面獲取可用鏈接的列表。但是當我抓取生成的鏈接時,我找不到源代碼。 (應該是內部的<音頻>標記的HREF)。獲取來自網站與Python

這裏是我的代碼:

# -*- coding: utf-8 -*- 
import urllib.request 
from bs4 import BeautifulSoup 

def getHTML(st): 
    with urllib.request.urlopen(site+'/',timeout=100) as response: 
     return response.read() 

site = 'http://www.e-radio.gr' 
soup = BeautifulSoup(getHTML(site), 'html.parser') 
# Parse Main Page And get links 
lst = list() 

for a in soup.body.find_all('a', {'class' : 'erplayer'}): 
    item = a.get('href') 
    if site in item: 
     lst.append(item) 
    else: 
     lst.append(site + item) 

print("\n".join(lst)) 

看來,網站無法正確加載並使用urllib.request裏它不會加載音頻信號源。還有什麼我可以使用,而不是urllib.request,所以它等待整個頁面加載。我還以爲是使用一些外部Web瀏覽器來生成HTML,但我不知道該怎麼做

+0

你可以發佈你需要的鏈接的HTML嗎?音頻鏈接html – Exprator

+0

網站鏈接在代碼中。這是我的代碼,你可以運行它 –

+0

權,但如果我們運行的代碼,我們可以看到印刷的音頻鏈接。問題是什麼? – alecxe

回答

3

這是一個有點棘手,但我們可以接近循序漸進 - 首先通過獲取玩家的HTML按照iframe鏈接。然後,獲取Flash播放器鏈接並關注它。然後,提取鏈接到mp3並下載流。所有在同一個網絡抓取會話下:

from urllib.parse import urljoin 

import requests 
from bs4 import BeautifulSoup 


def download_file(session, link, path): 
    r = session.get(link, stream=True) 
    if r.status_code == 200: 
     with open(path, 'wb') as f: 
      for chunk in r: 
       f.write(chunk) 


base_url = "http://www.e-radio.gr" 
url = "http://www.e-radio.gr/Rainbow-89-Thessaloniki-i92/live" 

with requests.Session() as session: 
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Safari/537.36'} 
    response = session.get(url) 

    soup = BeautifulSoup(response.content, "html.parser") 
    frame = soup.find(id="playerControls1") 
    frame_url = urljoin(base_url, frame["src"]) 

    response = session.get(frame_url) 
    soup = BeautifulSoup(response.content, "html.parser") 
    link = soup.select_one(".onerror a")['href'] 
    flash_url = urljoin(response.url, link) 

    response = session.get(flash_url) 
    soup = BeautifulSoup(response.content, "html.parser") 
    mp3_link = soup.select_one("param[name=flashvars]")['value'].split("url=", 1)[-1] 
    print(mp3_link) 

    download_file(session, mp3_link, "download.mp3")