2011-11-02 64 views
1

我試圖調整代碼以從wunderground中提取信息。然而,我正在嘗試改編的劇本是在2008年編寫的,地下天氣的格式已經改變。我在使用soup.body.nobr.b.string時遇到了問題。我想從一個給定的網站提取日常的降血脂數據。 http://www.wunderground.com/history/airport/KBUF/2011/5/2/DailyHistory.html 進口的urllib2 從BeautifulSoup進口BeautifulSoup無法使用BeautifulSoup解析WeatherUnderground

# Create/open a file called wunder.txt (which will be a comma-delimited file) 
f = open('wunder-data.txt', 'w') 

# Iterate through year, month, and day 
for y in range(1980, 2007): 
    for m in range(1, 13): 
    for d in range(1, 32): 

     # Check if leap year 
     if y%400 == 0: 
     leap = True 
     elif y%100 == 0: 
     leap = False 
     elif y%4 == 0: 
     leap = True 
     else: 
     leap = False 

     # Check if already gone through month 
     if (m == 2 and leap and d > 29): 
     continue 
     elif (m == 2 and d > 28): 
     continue 
     elif (m in [4, 6, 9, 10] and d > 30): 
     continue 

     # Open wunderground.com url 
     url = "http://www.wunderground.com/history/airport/KBUF/"+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html" 
     page = urllib2.urlopen(url) 

     # Get temperature from page 
     soup = BeautifulSoup(page) 
     dayTemp = soup.body.nobr.b.string 

     # Format month for timestamp 
     if len(str(m)) < 2: 
     mStamp = '0' + str(m) 
     else: 
     mStamp = str(m) 

     # Format day for timestamp 
     if len(str(d)) < 2: 
     dStamp = '0' + str(d) 
     else: 
     dStamp = str(d) 

     # Build timestamp 
     timestamp = str(y) + mStamp + dStamp 

     # Write timestamp and temperature to file 
     f.write(timestamp + ',' + dayTemp + '\n') 

# Done getting data! Close file. 
f.close() 

回答

3

不要惹解析HTML,它很可能會再次恕不另行通知更改。 獲取one of their CSV files(HTML頁面底部有鏈接),並使用csv模塊對其進行解析。

+0

哈,那是天才。好的發現:P –