2017-05-09 57 views
0

我試圖用這個代碼來檢索歷史氣象數據:Python的請求返回不同的數據

url = 'https://www.wunderground.com/history/airport/KDCA/2017/05/07/DailyHistory.html' 
querystring = {'format': '1'} 
headers = {'cache-control': 'no-cache', 
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8"} 
response = requests.get(url, headers=headers, params=querystring) 
print(response.text) 

我回來從請求如下:

TimeEDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,WindDirDegrees,DateUTC<br /> 
12:52 AM,50.0,43.0,77,29.63,10.0,WSW,6.9,-,N/A,,Partly Cloudy,240,2017-05-07 04:52:00<br /> 
1:52 AM,51.1,42.1,71,29.64,10.0,WSW,10.4,-,N/A,,Scattered Clouds,250,2017-05-07 05:52:00<br /> 
2:52 AM,50.0,41.0,71,29.65,10.0,WSW,10.4,-,N/A,,Partly Cloudy,240,2017-05-07 06:52:00<br /> 

但是,如果我用的是在我的瀏覽器(Safari)相同的網址我得到這個:

TimeEDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC 
12:52 AM,50.0,43.0,77,29.63,10.0,WSW,6.9,-,N/A,,Partly Cloudy,METAR KDCA 070452Z 24006KT 10SM FEW050 10/06 A2963 RMK AO2 SLP034 T01000061 401830100,240,2017-05-07 04:52:00 
1:52 AM,51.1,42.1,71,29.64,10.0,WSW,10.4,-,N/A,,Scattered Clouds,METAR KDCA 070552Z 25009KT 10SM SCT080 11/06 A2964 RMK AO2 SLP037 T01060056 10128 20100 53012,250,2017-05-07 05:52:00 
2:52 AM,50.0,41.0,71,29.65,10.0,WSW,10.4,-,N/A,,Partly Cloudy,METAR KDCA 070652Z 24009KT 10SM FEW050 10/05 A2965 RMK AO2 SLP040 T01000050,240,2017-05-07 06:52:00 

注意「FullMetar」列在Safari中返回,但在請求輸出中缺失。 (有趣的是,Chrome也省略了「FullMetar」列)。

我想使用python檢索數據,包括「FullMetar」列。

(這是一個沒有身份驗證,CSS,JavaScript等,這通常似乎是問題的基礎上,我讀過其他SO問題,一個很簡單的頁面。)

+0

這似乎是與該頁面是如何處理用戶代理或標題的問題。與python或請求無關。 – Alvaro

+1

在頁面底部(在我的瀏覽器中)有以下鏈接[顯示完整的METARS](http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1)。您可以設置一個會話並首先獲取該URI,然後在第二步中獲得實際數據。看起來您已經在瀏覽器中使用了第一個URL(並且可能存儲了相應的Cookie)。請參閱[請求'文檔](http://docs.python-requests.org/en/master/user/advanced/)。 –

+0

我在想這可能與cookie相關,但我不確定從哪裏開始尋找。我想我發現了這個問題,所以我會在下面發佈答案。 –

回答

2

挖儘管瀏覽器後開發檢查員我發現Prefs餅乾是Chrome和Safari之間的不同:

鉻: FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:KDCA*NULL|EXPFCT:1|

Safari瀏覽器: FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:KDCA*NULL|EXPFCT:1|SHOWMETAR:1|

所以,加入Prefs的cookie SHOWMETAR:1了我的請求解決我的問題:

url = 'https://www.wunderground.com/history/airport/KDCA/2017/05/07/DailyHistory.html' 
cookies = {'Prefs':'FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:NULL|EXPFCT:1|SHOWMETAR:1|'} 
querystring = {'format': '1'} 
response = requests.get(url, params=querystring, cookies=cookies) 
print(response.text)