我試圖從網站檢索數據。我的代碼如下:如何使用美麗的湯從標籤中提取數據
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup
# gets a file-like object using urllib2.urlopen
url = 'http://ecal.forexpros.com/e_cal.php?duration=weekly'
html = urlopen(url)
soup = BeautifulSoup(html)
# loops over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr'
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}):
# finds desired data by looking up <td> elements with class names
event = tr.find('td', {'class': 'ec_td_event'}).text
currency = tr.find('td', {'class': 'ec_td_currency'}).text
actual = tr.find('td', {'class': 'ec_td_actual'}).text
forecast = tr.find('td', {'class': 'ec_td_forecast'}).text
previous = tr.find('td', {'class': 'ec_td_previous'}).text
time = tr.find('td', {'class': 'ec_td_time'}).text
importance = tr.find('td', {'class': 'ec_td_importance'}).img.get('alt')
# the returned strings are unicode, so to print them we need to use a unicode string
if importance == 'High':
print(u'\t{:5}\t{}\t{:3}\t{:40}\t{:8}\t{:8}\t{:8}'.format(time, importance, currency, event, actual, forecast, previous))
在結果集中的前幾個記錄如下:
05:00 High EUR CPI (YoY) 1.3% 1.3% 1.3%
10:00 High USD Pending Home Sales (MoM) 1.5% 0.7% -0.7%
21:45 High CNY Caixin Manufacturing PMI 51.1 50.4 50.4
00:30 High AUD RBA Interest Rate Decision 1.50% 1.50% 1.50%
00:30 High AUD RBA Rate Statement
03:55 High EUR German Manufacturing PMI 58.1 58.3 58.3
03:55 High EUR German Unemployment Change -9K -5K 6K
我想現在從以下網站檢索類似的數據:
https://www.fxstreet.com/economic-calendar
爲此,我修改了上述代碼如下:
import re
from urllib2 import urlopen
from bs4 import BeautifulSoup
# gets a file-like object using urllib2.urlopen
url = 'https://www.fxstreet.com/economic-calendar'
html = urlopen(url)
soup = BeautifulSoup(html)
for tr in soup.find_all('tr', {'class': re.compile('fxst-tr-event fxst-oddRow fxit-eventrow fxst-evenRow ')}):
# finds desired data by looking up <div> elements with class names
event = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
currency = tr.find('div', {'class': 'fxit-event-name'}).text
actual = tr.find('div', {'class': ' fxit-actual'}).text
forecast = tr.find('div', {'class': 'fxit-consensus'}).text
previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
# importance = tr.find('td', {'class': 'ec_td_importance'}).img.get('alt')
# the returned strings are unicode, so to print them we need to use a unicode string
if importance == 'High':
print(u'\t{:5}\t{:3}\t{:40}\t{:8}\t{:8}\t{:8}'.format(time, currency, event, actual, forecast, previous))
此代碼不會返回任何結果(大概是因爲我引用了不正確的標記和/或類)。有沒有人看到我的錯誤在哪裏?
謝謝!
我在網站上看了一下,沒有_class_名爲'fxst-tr-event fxst-oddRow fxit-eventrow fxst-evenRow' – ksai