2013-09-24 88 views
0

我用的Python 2.7美麗的湯3.2,我得到了下面的刮刀以獲得流網址:循環循環,我可以做得更好嗎?

# Import the classes that are needed 
import urllib2 
from BeautifulSoup import BeautifulSoup 

# URL to scrape and open it with the urllib2 
url = 'http://www.wiziwig.tv/broadcast.php?matchid=219751&part=sports' 
source = urllib2.urlopen(url) 

# Turn the saved source into a BeautifulSoup object 
soup = BeautifulSoup(source) 

for tr in soup.findAll('tr', {'class': ['broadcast']}): 
    stationName = tr.findAll('td')[1].text 

    for trBelow in tr.findAllNext('tr'): 
     curClass = trBelow['class'] 
     if curClass == 'broadcast': 
      break 

     kindStream = trBelow.findAll('td')[0].text 
     streamUrl = trBelow.find('a', {'class': 'broadcast go'})['href'] 
     streamQuality = trBelow.findAll('td')[2].text 
     streamRating = trBelow.find('div', {'class': 'rating'})['rel'] 

     print stationName, kindStream, streamQuality, streamRating, streamUrl 

這是可以正常使用,並給出了下面的輸出:

BWIN Flash 650 Kbps 100 http://forum.wiziwig.eu/threads/1847-BWIN-Info 
BWIN Flash 675 Kbps 100 https://sports.bwin.com/en/sports?wm=3448325&zoneId=1068792 
Bet365 Flash 650 Kbps 100 http://forum.wiziwig.eu/threads/6258-Bet365 
Bet365 Flash 675 Kbps 100 http://www.bet365.com/?affiliate=365_014110 
TRK Ukraine+ AceStream 1250 Kbps 100 acestream://94879770520f2e9db2146d0eca59204bfbd72cbe 
TRK Ukraine+ AceStream 1251 Kbps 75 http://aviatortv.org/football_ua_plus/ 
Arenavision1 Sopcast 2000 Kbps 75 sop://broker.sopcast.com:3912/143876 
Arenavision3 AceStream 2000 Kbps 75 acestream://a53a380706846bfc6667e21a1485dedb78b9674b 
Arenavision3 AceStream 2001 Kbps 75 http://avod.me/play/a53a380706846bfc6667e21a1485dedb78b9674b 
Dazsports Ace2 AceStream 850 Kbps 100 acestream://d293c82146aa6c2904e45ff305ae0f38dc5b329d 
Dazsports Ace2 AceStream 851 Kbps 75 http://dazsports.org/ace2.html 
Digi Sport1 [RO] Sopcast 1500 Kbps 100 sop://broker.sopcast.com:3912/146141 
Digi Sport1 [RO] Sopcast 1500 Kbps 100 sop://broker.sopcast.com:3912/124992 
Digi Sport1 [RO] Sopcast 1501 Kbps 100 sop://broker.sopcast.com:3912/139777 
Digi Sport1 [RO] Sopcast 1501 Kbps 100 sop://broker.sopcast.com:3912/110152 
Pole Position1 [NL] AceStream 1000 Kbps 100 acestream://86fd521d30e9319198b75121761eccf260fef0cb 
Pole Position1 [NL] AceStream 1001 Kbps 75 http://polepositionweb.org/?page_id=6 popup 
Solodeportes Veetle Veetle 850 Kbps 100 http://veetle.com/index.php/widget/index/E47CFF6CB6A770852515B8B30C2E30F6/0/true/default/false 
Livesports4u4 Flash 225 Kbps 75 http://livesport4u.com/stream4.html 
Cricfree Flash2 Flash 175 Kbps 75 http://cricfree.tv/live-golf-streaming-ch2.php 
Njtvx9 Flash 175 Kbps 75 http://nutjob.eu/njtvx9.html 
Igoal C+ Liga Flash 175 Kbps 75 http://ana1.me/liga+.html 
Soccertoall2 [PT] Flash 175 Kbps 75 http://soccertoall.net/index.php?channel=2 
Tugalive1 Flash 175 Kbps 75 http://www.tugalive.eu/p/live-1.html 
Diresport1 Flash 175 Kbps 75 http://diresportt.blogspot.com.es/ 
Footstream11 Flash 175 Kbps 75 http://www.footstream.tv/channel11.html 
Lag10 (8) Flash 150 Kbps 50 http://lag10.com/channel8 
ANA STV2 Flash 400 Kbps 75 http://ana1.me/STV2.html 
ANA STV2 Flash 400 Kbps 75 http://bliner.tv/sporttv2pt.html 
Livesoccerhd4 Flash 225 Kbps 75 http://livesoccerhd.tv/l4.html 
Stvstreams Ace HD1 AceStream 1500 Kbps 100 acestream://750acfc788e12220dbd57188505eae08f566281e 
Stvstreams Ace HD1 AceStream 1500 Kbps 100 http://stvstreams.com/acestreams/stv-hd/ 
Btsportshd12 Flash 200 Kbps 75 http://www.btsportshd.com/stream12.php 
Ana Stream1 Flash 175 Kbps 75 http://ana3.me/STREAM1.html 
Onlinesoccer2all (13) Flash 175 Kbps 75 http://online--soccer.eu/channel13.html 
Hdfoots6 Flash 175 Kbps 75 http://hdfoots.com/stream6.html 

但是我想知道是否我喜歡它,或者有沒有更好的方法,而不進行下一個循環,然後在它碰到特定類時突然出現?

回答

0

我可能只是遍歷<tr>項目:

station_name = '' 
for tr in soup.findAll('tr'): 
    if tr['class'] == 'broadcast': 
     station_name = tr.findAll('td')[1].text 
    else: 
     # Your current extraction code 
     print stationName, kindStream, .... 

這樣的代碼是有點更清晰,我猜。

另一方面...看起來像你有一個快速的腳本工作。通過改變實際頁面的html輸出,它會比你的代碼中的錯誤更快地中斷。所以,如果它能夠工作,那麼它就會起作用,我會說。

0

我認爲你的實現已經很棒了。 只是一個簡單的問題,如果我想重用一些我收到的內容呢? 我聲稱「湯」沒有使用內置緩存爲此,如果我想通過 運行此循環它會遍歷節點。

這是我的看法:

with soup: 
    tr_elements, tr_belows, collection = findAll('tr', {'class': ['broadcast']}) \ 
             [tr.findAllNext('tr') for tr in tr_elements], {} 
    collection['station_names'] = [tr.findAll('td').text[1] for tr in tr_elements] 
    collection['kind_streams'] = [trb.findAll('td').text[0] for trb in tr_belows] 
    ## and so fourth. 
    print dict(collection) 

這仍然需要一些工作,因爲它不能掃描其他內部有一個「廣播」節點。另外,我的方法的複雜性可以使用一些工作。