2017-07-24 38 views
-1

我試過網上刮dataes從一個網站名爲:flightradar24如何從(javascript?)網站進行網絡抓取?

我的代碼,我正在尋找在機場的名字,我想網上刮「到達」表。 Web刮名稱工作,因爲這只是一個H1 HTML格式,但如果我嘗試網絡抓取這張表與我的代碼,我沒有得到任何值,我只得到對象名稱(也許是因爲有一個JavaScript?)

有什麼解決方案,我可以網上抓這個頁面的這部分? (Python 2.7版)

我嘗試這樣做:

import urllib2, sys 
from BeautifulSoup import BeautifulSoup 

site= "https://www.flightradar24.com/data/airports/bud/arrivals" 
hdr = {'User-Agent': 'Mozilla/5.0'} 
req = urllib2.Request(site,headers=hdr) 
page = urllib2.urlopen(req) 
soup = BeautifulSoup(page) 
name = soup.find('h1' , attrs={'class' : 'airport-name'}) 
print name 

table = soup.find('div', { "class" : "row cnt-schedule-table" }) 
print table 

我得到這個,當我要打印表格:

<div class="row cnt-schedule-table"><label class="m-b-m">ARRIVALS</label><table class="table table-condensed table-hover data-table m-n-t-15"><thead><tr class="hidden-xs hidden-sm"><th class="w-80">TIME</th><th class="w-90">FLIGHT</th><th>FROM</th><th>AIRLINE</th><th class="w-120">AIRCRAFT</th><th class="w-10"></th><th class="w-160">STATUS</th></tr><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadEarlier === false}" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0)"> 0)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="-1" data-timestamp="{{currentUtcTimestampRender/1000}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading earlier flights...' class="btn btn-table-action btn-flights-load">Load earlier flights</button></td></tr></thead><tbody><tr ng-cloak="ng-cloak" class="loader" ng-show="(isFetching == true)"><td colspan="7" class="text-center"><i class="fa fa-spinner fa-pulse"></i> Loading...</td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length == 0)"><td colspan="7" class="text-center">Sorry, we don't have any information about flights for this airport</td></tr><tr ng-cloak="ng-cloak" class="hidden-md hidden-lg" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)"><td colspan="7" class="state-block-{{objFlight.flight.status.generic.status.color || 'gray'}}"><div class="row"><div class="col-xs-12 col-sm-12 p-xxs"><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-clock-o"></i> <span>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</span></div><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-tag"></i> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></div><div class="col-xs-6 col-sm-6 p-xxs"><i class="fa fa-map-marker"></i> <span ng-bind-html="objFlight.flight.airport.origin.position.region.city || '-' | unsafe">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs" title="{{objFlight.flight.aircraft.model.text || ''}}"><i class="fa fa-plane"></i> {{objFlight.flight.aircraft.model.code || '-'}}</div><div class="col-xs-3 col-sm-3 p-xxs"><a ng-show="(objFlight.flight.aircraft.registration)" class="notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">{{objFlight.flight.aircraft.registration}}</a></div><div class="col-xs-6 col-sm-6 p-xxs">{{ objFlight.flight.airline.name || '-'}}</div></div></td></tr><tr ng-cloak="ng-cloak" class="hidden-xs hidden-sm" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)" data-date="{{(objFlight.flight.time.scheduled.arrival * 1000) | date: 'EEEE, MMM dd' : timeZone}}" tbl-render-directive="tbl-render-directive"><td>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</td><td class="p-l-s cell-flight-number"><a class="chevron-toggle" ng-if="(objFlight.flight.identification.codeshare != null)" data-codeshare="{{objFlight.flight.identification.codeshare}}"></a> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></td><td><div ng-show="(objFlight.flight.airport.origin)"><span class="hide-mobile-only">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="fs-10 fbold notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div><div ng-show="!(objFlight.flight.airport.origin)">-</div></td><td ng-bind-html=" objFlight.flight.airline.name || '-' | unsafe" title="{{ objFlight.flight.airline.name || ''}}" class="cell-airline"></td><td><span class="notranslate" ng-show="(objFlight.flight.aircraft.model.code)">{{objFlight.flight.aircraft.model.code}} </span><a ng-show="(objFlight.flight.aircraft.registration)" class="fs-10 fbold notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">({{objFlight.flight.aircraft.registration}}) </a><span ng-if="(!objFlight.flight.aircraft.model.code &amp;&amp; !objFlight.flight.aircraft.registration)">-</span></td><td><div class="state-block {{objFlight.flight.status.generic.status.color || 'gray'}}"></div></td><td><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</td></tr></tbody><tfoot><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadLater === false }" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"> 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="2" data-timestamp="{{currentUtcTimestampRender/1000 | int}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading later flights...' class="btn btn-table-action btn-flights-load">Load later flights</button></td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false)"><td colspan="7">* All times are in {{(airportView.schedule.arrivals.data &amp;&amp; timeZone.toUpperCase() == 'UTC' ? 'UTC' : 'local')}} timezone</td></tr></tfoot></table></div>

的articel代碼答案不起作用:

import urllib2 
 
from bs4 import BeautifulSoup 
 
import json 
 

 
# new url  
 
url = 'https://www.flightradar24.com/data/airports/bud/arrivals' 
 

 
# read all data 
 
page = urllib2.urlopen(url).read() 
 

 
# convert json text to python dictionary 
 
data = json.loads(page) 
 

 
print(data['row cnt-schedule-table'])

+0

''我只得到對象名稱'' - 你是什麼意思?你得到的確切輸出是什麼?你用這個檢查的HTML是什麼? ('page'的值,或者其中的一些子集。)你爲什麼認爲涉及JavaScript? – David

+0

我是programmin的新人,我只是想。 我編輯我的問題,你可以看到我得到,如果我打印結果。 – tardos93

回答

0

Here是有一個非常類似的問題的解決方案的另一個堆棧溢出的文章。您似乎需要更改網址以匹配所呈現的網址,而不是您通常在瀏覽器中使用的網址。

+0

我也試過這篇文章代碼(我更新到我的問題),但仍然無法正常工作。 – tardos93