我正在研究「Video Downloader」,並且我有一個BeautifulSoup4問題。Python(BeautifulSoup) - 從<script>獲取href
這裏是HTML的一部分,從我希望得到A HREF:
<script src="/static/common.js?v7"></script>
<script type="text/javascript">
var c = 6;
window.onload = function() {
count();
}
function closeAd(){
$("#easy-box").hide();
}
function notLogedIn(){
$("#not-loged-in").html("You need to be logged in to download this movie!");
}
function count() {
if(document.getElementById('countdown') != null){
c -= 1;
//If the counter is within range we put the seconds remaining to the <span> below
if (c >= 0)
if(c == 0){
document.getElementById('countdown').innerHTML = '';
}
else {
document.getElementById('countdown').innerHTML = c;
}
else {
document.getElementById('download-link').innerHTML = '<a style="text-decoration:none;" href="http://s896.vshare.io/download,9999999999999999999999999999999999999999-f6192405453bf5ff3cfe41a488d8390d,5944ed28,4d948c5.avi">Click here</a> to download requested file.';
return;
}
//setTimeout('count()', 1000);
}
}
</script>
<script type="text/javascript" src="/static/flowplayer/flowplayer-3.2.13.min.js"></script>
這裏是HREF我要打印:
href="http://s896.vshare.io/download,9999999999999999999999999999999999999999-f6192405453bf5ff3cfe41a488d8390d,5944ed28,4d948c5.avi"
我這個嘗試,但它的不工作。
for a in soup3.find_all('a'):
if 'href' in a.attrs:
print(a['href'])
該href是JavaScript內。您可以抓住js部分並在[regex](https://docs.python.org/3/howto/regex.html)的幫助下提取href。看看這個[問題](https://stackoverflow.com/questions/24333189/parsing-js-with-beautiful-soup) – trotta