使用Wunderground數據進行網頁掃描，BeautifulSoup

好的，我在智慧的結尾。對於我的班級，我們應該從wunderground.com網站上抓取數據。我們不斷遇到問題（錯誤消息），或者代碼運行正常，但.txt文件將包含NO數據。這很煩人，因爲我需要這樣做！所以這裏是我的代碼。使用Wunderground數據進行網頁掃描，BeautifulSoup

f = open('wunder-data1.txt', 'w') 
for m in range(1, 13): 
for d in range(1, 32): 
    if (m == 2 and d > 28): 
     break 
    elif (m in [4, 6, 9, 11] and d > 30): 
     break 
    url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html" 
    page = urllib2.urlopen(url) 
    soup = BeautifulSoup(page, "html.parser") 
    dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True) 
    if len(str(m)) < 2: 
     mStamp = '0' + str(m) 
    else: 
     mStamp = str(m) 
    if len(str(d)) < 2: 
     dStamp = '0' +str(d) 
    else: 
     dStamp = str(d) 
    timestamp = '2009' + mStamp +dStamp 
    f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n') 
    f.close()

另外對不起，這段代碼可能不是正確的縮進，因爲它在Python中。我對此並不擅長。

更新：所以有人回答下面的問題，它的工作，但我意識到我拉錯了數據（oops）。所以我把這個放在：

import codecs 
    import urllib2 
    from bs4 import BeautifulSoup 

    f = codecs.open('wunder-data2.txt', 'w', 'utf-8') 

    for m in range(1, 13): 
     for d in range(1, 32): 
      if (m == 2 and d > 28): 
       break 
      elif (m in [4, 6, 9, 11] and d > 30): 
       break 

      url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html" 
      page = urllib2.urlopen(url) 
      soup = BeautifulSoup(page, "html.parser") 

      dayTemp = soup.findAll(attrs={"class":"wx-value"})[5].span.string 
      if len(str(m)) < 2: 
       mStamp = '0' + str(m) 
      else: 
       mStamp = str(m) 
      if len(str(d)) < 2: 
       dStamp = '0' +str(d) 
      else: 
       dStamp = str(d) 

      timestamp = '2009' + mStamp +dStamp 

      f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n') 

    f.close()

所以我很不確定。我正在試圖做的是數據刮

來源

2017-01-15 Sierra Thomander

請編輯您的帖子以修復您的縮進，以便發佈的代碼實際運行。另外，請添加任何錯誤或回溯的**全文**。 – MattDMo

解釋您想要獲取數據的月份和日期。此外，而不是2循環創建一個網址列表，並一次處理它們，只是一個建議。你的代碼很雜亂...... – firephil

沒有任何錯誤，它只是不會把任何東西放到.txt文件中。另外，我很抱歉。我真的不知道我在做什麼。這是全班同學。 –

我遇到了以下錯誤（及以下固定它們）試圖執行代碼時：

嵌套循環的壓痕是無效的。
缺少進口（頂部的行），但也許你只是從你的粘貼中排除它們。
嘗試將「utf-8」編碼的字符串寫入「ascii」文件。爲了解決這個問題，我使用codecs模塊打開文件f作爲「utf-8」。
該文件在循環內部被關閉，這意味着在第一次寫入文件之後，它將被關閉，然後下一次寫入將失敗（因爲它已關閉）。我移動該行以關閉文件到循環的外部。

現在，據我所知（沒有你告訴我們你真的想要這段代碼做什麼），它的工作？至少沒有錯誤會立即彈出...

import codecs 
import urllib2 
from bs4 import BeautifulSoup 

f = codecs.open('wunder-data1.txt', 'w', 'utf-8') 

for m in range(1, 13): 
    for d in range(1, 32): 
     if (m == 2 and d > 28): 
      break 
     elif (m in [4, 6, 9, 11] and d > 30): 
      break 

     url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html" 
     page = urllib2.urlopen(url) 
     soup = BeautifulSoup(page, "html.parser") 

     dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True) 

     if len(str(m)) < 2: 
      mStamp = '0' + str(m) 
     else: 
      mStamp = str(m) 
     if len(str(d)) < 2: 
      dStamp = '0' +str(d) 
     else: 
      dStamp = str(d) 

     timestamp = '2009' + mStamp +dStamp 

     f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n') 

f.close()

至於你的問題的意見建議，還有我還沒有上觸摸其他需要改進的地方在這裏 - 我只是試圖讓代碼你發佈執行。

來源

2017-01-15 01:34:45

好吧，迄今爲止你的代碼工作Bilal Akil，所以謝謝你！對不起，我很無能。我以前從未使用過Python，也沒有預先需求。對於班級來說，但我不認爲我們的老師會意識到這將會是多麼困難。我非常感謝你的幫助！ –

進口編解碼器也做什麼？ –

'進口編解碼器'是解決我提到的第三個問題所必需的。我在後面的4行中使用了導入的'codecs'模塊來改變你打開文件的方式：'codecs.open（'wunder-data.txt'，'w'，'utf-8'）'。它與之前打開的文件相同，但是這次是UTF-8編碼。 –

使用Wunderground數據進行網頁掃描，BeautifulSoup

回答

相關問題