2013-07-22 60 views
0

我想斯克羅布下頁http://209.105.250.69:8382/獲得使用Python使用Python代碼scubb網站

<td>Current Listeners:</td> 
<td class="streamdata">28</td> 

,這裏的聽衆數是從網站的代碼

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
<title>Icecast Streaming Media Server</title> 
<link rel="stylesheet" type="text/css" href="style.css"> 
</head> 
<body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0"> 
<h2>Icecast2 Status</h2> 
<br><div class="roundcont"> 
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div> 
<table border="0" width="100%" id="table1" cellspacing="0" cellpadding="4"><tr><td bgcolor="#656565"> 
<a class="nav" href="admin/">Administration</a><a class="nav" href="status.xsl">Server Status</a><a class="nav" href="server_version.xsl">Version</a> 
</td></tr></table> 
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div> 
</div> 
<br><br><div class="roundcont"> 
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div> 
<div class="newscontent"> 
<div class="streamheader"><table cellspacing="0" cellpadding="0"> 
<colgroup align="left"></colgroup> 
<colgroup align="right" width="300"></colgroup> 
<tr> 
<td><h3>Mount Point /listen.mp3</h3></td> 
<td align="right"> 
<a href="/listen.mp3.m3u">M3U</a><a href="/listen.mp3.xspf">XSPF</a> 
</td> 
</tr> 
</table></div> 
<table border="0" cellpadding="4"> 
<tr> 
<td>Stream Title:</td> 
<td class="streamdata">Quran Kareem Radio</td> 
</tr> 
<tr> 
<td>Stream Description:</td> 
<td class="streamdata">Quran Kareem Radio</td> 
</tr> 
<tr> 
<td>Content Type:</td> 
<td class="streamdata">audio/mpeg</td> 
</tr> 
<tr> 
<td>Mount started:</td> 
<td class="streamdata">Wed, 17 Jul 2013 05:40:46 -0400</td> 
</tr> 
<tr> 
<td>Bitrate:</td> 
<td class="streamdata">60</td> 
</tr> 
<tr> 
<td>Current Listeners:</td> 
<td class="streamdata">28</td> 
</tr> 
<tr> 
<td>Peak Listeners:</td> 
<td class="streamdata">202</td> 
</tr> 
<tr> 
<td>Stream Genre:</td> 
<td class="streamdata">Islam</td> 
</tr> 
<tr> 
<td>Stream URL:</td> 
<td class="streamdata"><a target="_blank" href="http://qkradio.com.au">http://qkradio.com.au</a></td> 
</tr> 
<tr> 
<td>Current Song:</td> 
<td class="streamdata"></td> 
</tr> 
</table> 
</div> 
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div> 
</div> 
<br><br>&nbsp; 


<div class="poster">Support icecast development at <a class="nav" target="_blank" href="http://www.icecast.org">www.icecast.org</a> 
</div> 
</body> 
</html> 
+0

那麼你有什麼嘗試? – zhangyangyu

+0

在BeautifulSoup上進行的努力工作 – Ossama

回答

2
>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup(s) 
>>> td1 = soup.find('td', text='Current Listeners:') 
>>> td2 = td1.find_next_sibling('td') 
>>> td2.text 
'28' 
>>> 
+0

你本可以給OP一個機會至少 – TerryA

+0

確實,哈哈; ) – tamasgal

+0

我剛剛看到OP目前在bs上,所以我認爲他一定嘗試了一些東西。 @Haidro – zhangyangyu

2

你將要使用像BeautifulSoup這樣的HTML解析器。我沒有要發佈一個完整的解決方案(因爲它看起來像你還沒有嘗試做任何事情),但這裏有一個演示:

from bs4 import BeautifulSoup as BS 
html = the_above 
soup = BS(html) 
print soup.find_all('tr') 

這將打印在每一個代碼<tr>標籤(如列表)