2015-10-04 22 views
1

我正在使用VBA和MSXML來抓取一些網頁內容,所以我知道基礎知識。但是現在我想從JavaScript生成的網頁獲取數據。 我不能給你確切的鏈接,因爲它是私人的,但我可以描述它 - 基本上,有容器標題和一些圖像的div容器,下面是表,動態加載(圓圈),但不更新(所以他們只加載一次)。如果在瀏覽器中打開源代碼視圖,則無法找到這些表格,只能找到圖像的容器和標題/ src。但如果你點擊表格並選擇「檢查元素」,你可以看到典型的<th <tr> <td>結構等 我知道的方法:將由JavaScript生成的網頁內容刮至Excel

1)保存頁面,然後刮 - 它可能不是最好的解決方案。

如果我有他們的URL列表,是否有任何快速的方式來保存所有頁面?

2)通過VBA使用Internet Explorer控件,等到頁面被加載,然後像往常一樣獲取元素 - 但對我來說似乎很慢(?) - 即使加載了0.5s,也能在一頁上顯示25s。

也許我應該關閉一些減緩加載速度的東西?
你可以檢查什麼是錯的?

這裏是代碼,我發現:

Sub FuturesScrap3(ByVal URL As String) 

Dim HTMLDoc As New HTMLDocument 
Dim AnchorLinks As Object 
Dim tdElements As Object 
Dim tdElement As Object 
Dim AnchorLink As Object 
Dim lRow As Long 
Dim oElement As Object 

Dim oIE As InternetExplorer 

Set oIE = New InternetExplorer 

oIE.navigate URL 
oIE.Visible = True 

Do Until (oIE.readyState = 4 And Not oIE.Busy) 
    DoEvents 
Loop 

'Wait for Javascript to run 
Application.Wait (Now + TimeValue("0:01:00")) 

HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

With HTMLDoc.body 
    Set AnchorLinks = .getElementsByTagName("a") 
    Set tdElements = .getElementsByTagName("td") ' 

    For Each AnchorLink In AnchorLinks 
     Debug.Print AnchorLink.innerText 
    Next AnchorLink 

End With 

lRow = 1 
For Each tdElement In tdElements 
    Debug.Print tdElement.innerText 
    Cells(lRow, 1).Value = tdElement.innerText 
    lRow = lRow + 1 
Next 

'Clicking the Month tab 
For Each oElement In oIE.document.all 
    If Trim(oElement.innerText) = "Month" Then 
     oElement.Focus 
     oElement.Click 
    End If 
Next oElement 

Do Until (oIE.readyState = 4 And Not oIE.Busy) 
    DoEvents 
Loop 

'Wait for Javascript to run 
Application.Wait (Now + TimeValue("0:01:00")) 

HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

With HTMLDoc.body 
    Set AnchorLinks = .getElementsByTagName("a") 
    Set tdElements = .getElementsByTagName("td") ' 

    For Each AnchorLink In AnchorLinks 
     Debug.Print AnchorLink.innerText 
    Next AnchorLink 
End With 

lRow = 1 
For Each tdElement In tdElements 
    Debug.Print tdElement.innerText 
    Cells(lRow, 2).Value = tdElement.innerText 
    lRow = lRow + 1 
Next tdElement End sub 

3)使用網絡驅動器,如硒 - 找不到合適的例子。如果你從頭開始給我一些東西,比如通過classname從元素中獲取數據,那會很好。

4)對我來說是未知的,但可能是最快的 - 直接從用於構建這些表的JS變量/數組獲取數據。我聽說你可以用JavaScript連接VBA,但還沒有找到任何適當的例子來獲取數據。

所有的解決方案都應該在VBA範圍內。我想知道什麼是最快的方式。

+0

您可以使用Excel內置的Web數據檢索嗎? – Marc

+0

如果您的代碼正在工作,並且您正在尋找改進,請考慮在[codereview](http://codereview.stackexchange.com) – 2015-10-05 08:05:15

回答

0

謝謝您的意見。 @Marc,不,不可能使用網絡查詢/電力查詢的「從網站導入」,只有標題獲取數據。

我編輯了一下代碼 - 有1分鐘(!)延遲(也許作者在加載頁面時加入延遲加載時出錯)。

Sub FuturesScrap3(ByVal URL As String) 

Dim HTMLDoc As New HTMLDocument 
Dim AnchorLinks As Object 
Dim tdElements As Object 
Dim tdElement As Object 
Dim AnchorLink As Object 
Dim lRow As Long 
Dim oElement As Object 

Dim oIE As InternetExplorer 

Set oIE = New InternetExplorer 

oIE.navigate URL 
oIE.Visible = True 

Do Until (oIE.readyState = 4 And Not oIE.Busy) 
    DoEvents 
Loop 

'Wait for Javascript to run - 1 second is enough in my case 
Application.Wait (Now + TimeValue("0:00:01")) 

HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

With HTMLDoc.body 
    Set AnchorLinks = .getElementsByTagName("a") 
    Set tdElements = .getElementsByTagName("td") ' 

    For Each AnchorLink In AnchorLinks 
     Debug.Print AnchorLink.innerText 
    Next AnchorLink 

End With 

lRow = 1 
For Each tdElement In tdElements 
    Debug.Print tdElement.innerText 
    Cells(lRow, 1).Value = tdElement.innerText 
    lRow = lRow + 1 
Next 

'Clicking the Month tab 
For Each oElement In oIE.document.all 
    If Trim(oElement.innerText) = "Month" Then 
     oElement.Focus 
     oElement.Click 
    End If 
Next oElement 

Do Until (oIE.readyState = 4 And Not oIE.Busy) 
    DoEvents 
Loop 


HTMLDoc.body.innerHTML = oIE.document.body.innerHTML 

With HTMLDoc.body 
    Set AnchorLinks = .getElementsByTagName("a") 
    Set tdElements = .getElementsByTagName("td") ' 

    For Each AnchorLink In AnchorLinks 
     Debug.Print AnchorLink.innerText 
    Next AnchorLink 
End With 

lRow = 1 
For Each tdElement In tdElements 
    Debug.Print tdElement.innerText 
    Cells(lRow, 2).Value = tdElement.innerText 
    lRow = lRow + 1 
Next tdElement 
End sub