2014-02-07 51 views
2

我想從網頁http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures獲取一些數據。使用XMLHTTP使用vba進行網頁抓取

如果我使用舊的InternetExplorer對象(下面的代碼),我可以通過HTML文檔。但我想用XMLHTTP對象(第二個代碼)。

Sub IEZagon() 
    'we define the essential variables 
    Dim ie As Object 
    Dim TDelement, TDelements 
    Dim AnhorLink, AnhorLinks 

    'add the "Microsoft Internet Controls" reference in your VBA Project indirectly 
    Set ie = CreateObject("InternetExplorer.Application") 
    With ie 
     .Visible = True 
     .navigate ("[URL]http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures[/URL]") 
     While ie.ReadyState <> 4 
      DoEvents 
     Wend 
     Set AnhorLinks = .document.getElementsbytagname("a") 
     Set TDelements = .document.getElementsbytagname("td") 
     For Each AnhorLink In AnhorLinks 
      Debug.Print AnhorLink.innertext 
     Next 
     For Each TDelement In TDelements 
      Debug.Print TDelement.innertext 
     Next 
    End With 
    Set ie = Nothing 
End Sub 

使用XMLHTTP與對象代碼:

Sub FuturesScrap(ByVal URL As String) 
    Dim XMLHttpRequest As XMLHTTP 
    Dim HTMLDoc As New HTMLDocument 

    Set XMLHttpRequest = New MSXML2.XMLHTTP 
    XMLHttpRequest.Open "GET", URL, False 
    XMLHttpRequest.send 
    While XMLHttpRequest.readyState <> 4 
     DoEvents 
    Wend 

    Debug.Print XMLHttpRequest.responseText 
    HTMLDoc.body.innerHTML = XMLHttpRequest.responseText 

    With HTMLDoc.body 
     Set AnchorLinks = .getElementsByTagName("a") 
     Set TDelements = .getElementsByTagName("td") 

     For Each AnchorLink In AnchorLinks 
      Debug.Print AnhorLink.innerText 
     Next 

     For Each TDelement In TDelements 
      Debug.Print TDelement.innerText 
     Next 
    End With 
End Sub 

我只得到基本的HTML:

<html> 
<head> 
<title>Resource Not found</title> 
<link rel= 'stylesheet' type='text/css' href='/blueprint/css/errorpage.css'/> 
</head> 
<body> 
<table class="header"> 
<tr> 
<td class="CMTitle CMHFill"><span class="large">Resource Not found</span></td> 
</tr> 
</table> 
<div class="body"> 
<p style="font-weight:bold;">The requested resource does Not exist.</p> 
</div> 
<table class="footer"> 
<tr> 
<td class="CMHFill"> </td> 
</tr> 
</table> 
</body> 
</html> 

我想通過表格和coresponding數據走...... 最後我想要選擇年份到月份的不同時間間隔:

I我真的很感謝任何幫助!謝謝!

+2

看起來像你的要求了不正確的URL ... –

+0

我高林權網址: – Figlio

+0

見@ brettdj的答覆[這裏](http://stackoverflow.com/questions/8798260/html-解析的cricinfo記分卡) –

回答

3

我可以確認,當我運行代碼(帶或不帶url標記)時,我會得到與您相同的HTML。我發現一個有用的帖子here。我已經使用在那裏找到的方法修改了您的代碼,現在它似乎已經下載了正確的信息。

Sub test() 
    Call FuturesScrap1("http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures") 
End Sub 

我包含了調用子,因爲url標記似乎爲MSXML請求導致錯誤。

Sub FuturesScrap1(ByVal URL As String) 
    Dim HTMLDoc As New HTMLDocument 
    Dim oHttp As MSXML2.XMLHTTP 
    Dim sHTML As String 
    Dim AnchorLinks As Object 
    Dim TDelements As Object 
    Dim TDelement As Object 
    Dim AnchorLink As Object 

    On Error Resume Next 
    Set oHttp = New MSXML2.XMLHTTP 
    If Err.Number <> 0 Then 
     Set oHttp = CreateObject("MSXML.XMLHTTPRequest") 
     MsgBox "Error 0 has occured while creating a MSXML.XMLHTTPRequest object" 
    End If 
    On Error GoTo 0 
    If oHttp Is Nothing Then 
     MsgBox "For some reason I wasn't able to make a MSXML2.XMLHTTP object" 
     Exit Sub 
    End If 

    'Open the URL in browser object 
    oHttp.Open "GET", URL, False 
    oHttp.send 
    sHTML = oHttp.responseText 

    Debug.Print oHttp.responseText 

    HTMLDoc.body.innerHTML = oHttp.responseText 

    With HTMLDoc.body 
     Set AnchorLinks = .getElementsByTagName("a") 
     Set TDelements = .getElementsByTagName("td") 

     For Each AnchorLink In AnchorLinks 
      Debug.Print AnchorLink.innerText 
     Next 

     For Each TDelement In TDelements 
      Debug.Print TDelement.innerText 
     Next 
    End With 

End Sub 

編輯如下因素註釋:

我一直沒能找到使用MSXML2對象的表元素,源代碼似乎並沒有包含這些內容。在firebug中,td標籤是存在的,所以我認爲表是由JavaScript代碼生成的。我不知道MSXML2是否可以運行JavaScript,因此我修改了使用Internet Explorer的子程序,它不是快速代碼,但它確實找到了td元素,並且確實允許單擊這些標籤。我發現td元素需要一些時間才能變得可用(大概是因爲IE需要運行JavaScript),所以我已經在xl下載數據之前等待了幾個步驟。

我已經放入了一些代碼,將td元素的內容下載到活動工作表中,如果在工作簿中使用有用數據運行它,請小心。

Sub FuturesScrap3(ByVal URL As String) 

    Dim HTMLDoc As New HTMLDocument 
    Dim AnchorLinks As Object 
    Dim tdElements As Object 
    Dim tdElement As Object 
    Dim AnchorLink As Object 
    Dim lRow As Long 
    Dim oElement As Object 

    Dim oIE As InternetExplorer 

    Set oIE = New InternetExplorer 

    oIE.navigate URL 
    oIE.Visible = True 

    Do Until (oIE.readyState = 4 And Not oIE.Busy) 
     DoEvents 
    Loop 

    'Wait for Javascript to run 
    Application.Wait (Now + TimeValue("0:01:00")) 

    HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

    With HTMLDoc.body 
     Set AnchorLinks = .getElementsByTagName("a") 
     Set tdElements = .getElementsByTagName("td") ' 

     For Each AnchorLink In AnchorLinks 
      Debug.Print AnchorLink.innerText 
     Next AnchorLink 

    End With 

    lRow = 1 
    For Each tdElement In tdElements 
     Debug.Print tdElement.innerText 
     Cells(lRow, 1).Value = tdElement.innerText 
     lRow = lRow + 1 
    Next 

    'Clicking the Month tab 
    For Each oElement In oIE.document.all 
     If Trim(oElement.innerText) = "Month" Then 
      oElement.Focus 
      oElement.Click 
     End If 
    Next oElement 

    Do Until (oIE.readyState = 4 And Not oIE.Busy) 
     DoEvents 
    Loop 

    'Wait for Javascript to run 
    Application.Wait (Now + TimeValue("0:01:00")) 

    HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

    With HTMLDoc.body 
     Set AnchorLinks = .getElementsByTagName("a") 
     Set tdElements = .getElementsByTagName("td") ' 

     For Each AnchorLink In AnchorLinks 
      Debug.Print AnchorLink.innerText 
     Next AnchorLink 
    End With 

    lRow = 1 
    For Each tdElement In tdElements 
     Debug.Print tdElement.innerText 
     Cells(lRow, 2).Value = tdElement.innerText 
     lRow = lRow + 1 
    Next tdElement 

End sub 
+0

我星期六做了相同的代碼。但我在這個網頁上仍然有問題。隨着你和我的代碼,我不能列出6個按鈕(錨點)名稱Year to Day Day。如果我想根據時間窗口(年,季度等)走過不同的表格,我需要點擊這些按鈕中的任何一個。但這不是最後一個問題,在我們的代碼中我們不能用代碼列出表格數據:[代碼]對於TDelements中的每個TDelement Debug.Print TDelement。innerText 下一頁[\ code] – Figlio

+1

@Figlio我修改了獲取TD元素並允許更改表格的答案,但它使用interenet資源管理器,而不是MSXML2,這可能因JavaScript而需要。 –

+0

謝謝。隨着IE對象的作品。我知道,我做了和你一樣的代碼。和我有同樣的問題需要Application.wait metod。如果是這樣,並且不使用XMLHTTP,我將繼續使用IE。再次感謝! – Figlio