期待每日刮刮一個網站並設置警報

我需要運行一個腳本，每天刮擦以下站點（當腳本運行時它會刮擦當天的日曆）（相當於單擊「每日」按鈕）期待每日刮刮一個網站並設置警報

http://www.fxempire.com/economic-calendar/

我想提取所有的日子數據/事件那一天，和過濾器相關的貨幣（如適用），並隨後創建某種警告或每前彈出10分鐘這些事件將發生。

我使用下面的代碼到目前爲止刮網頁，然後查看/打印變量「HTML」，但無法找到我需要的日曆信息。

import sys 
from PyQt4.QtGui import * 
from PyQt4.QtCore import * 
from PyQt4.QtWebKit import * 



class Render(QWebPage): 
    def __init__(self, url): 
    self.app = QApplication(sys.argv) 
    QWebPage.__init__(self) 
    self.loadFinished.connect(self._loadFinished) 
    self.mainFrame().load(QUrl(url)) 
    self.app.exec_() 

    def _loadFinished(self, result): 
    self.frame = self.mainFrame() 
    self.app.quit() 

url = 'http://www.fxempire.com/economic-calendar/' 
r = Render(url) 
html = r.frame.toHtml()

來源

2014-01-08 Cubix

你能告訴我們你到目前爲止？ –

堆棧溢出不是免費腳本服務的存儲庫。向我們展示您嘗試過的一些代碼，我們一定會提供幫助，但請不要只寫「這是我需要的，幫助！」 –

道歉，現在已更新原始發佈，包括我試圖使用的代碼 – Cubix

在我看來，從網頁刮數據的最佳方式是使用BeautifulSoup。這是一個快速的腳本，可以獲取你想要的數據。

import re 
from urllib2 import urlopen 
from bs4 import BeautifulSoup 


# Get a file-like object using urllib2.urlopen 
url = 'http://ecal.forexpros.com/e_cal.php?duration=daily' 
html = urlopen(url) 

# BS accepts a lot of different data types, so you don't have to do e.g. 
# urlopen(url).read(). It accepts file-like objects, so we'll just send in html 
# as a parameter. 
soup = BeautifulSoup(html) 

# Loop over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr' 
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}): 
    # Find the event, currency and actual price by looking up <td> elements 
    # with class names. 
    event = tr.find('td', {'class': 'ec_td_event'}).text 
    currency = tr.find('td', {'class': 'ec_td_currency'}).text 
    actual = tr.find('td', {'class': 'ec_td_actual'}).text 

    # The returned strings which are returned are unicode, so to print them, 
    # we need to use a unicode string. 
    print u'{:3}\t{:6}\t{}'.format(currency, actual, event)

爲了給你如何在未來解決問題的一些像這樣的提示，我寫下來解決您的問題，當我使用的步驟。希望能幫助到你。

我在Chrome中打開網頁，右鍵單擊並選擇Inspect Element。
找到iframe與信息通過查找元素選項卡，並打開該網址。
檢查了此頁面，發現所有含有數據的元素都是<tr>元素，並且有ec_bg1_tr或ec_bg2_tr類別。
我從早前遇到BS時就知道，它可以通過使用soup.find_all('tr', {'class': 'ec_bg1_tr'})找到所有tr元素與類別ec_bg1_tr。我的第一步是首先循環這些元素，然後遍歷ec_bg2_tr元素。
然後我覺得BS可以很聰明地接受正則表達式作爲輸入，所以我檢查了他們的docs，看起來這應該不成問題。
按照文檔中的配方，我嘗試使用簡單的正則表達式'ec_bg_ [12] _tr'。
Ca-ching！

來源

2014-01-09 01:00:13

這真的是很好的解決方案，我現在正在使用它進行基本分析，我還有其他工具，如ystockquote python庫，我使用了一些我的代碼對我的股票進行一些技術分析！這是很好的和可定製的最大@ Steinar Lima.thank你！ – toufikovich

期待每日刮刮一個網站並設置警報

回答

相關問題