2017-06-16 114 views
0

無法弄清楚如何從我的代碼中使用的頁面獲取所有公司鏈接。運行我的腳本我只有20個鏈接。該頁面有lazyloading方法,這就是爲什麼我不能得到所有這些。任何對此的意見將不勝感激。我已經嘗試到目前爲止:無法解析網頁中的所有鏈接

Sub Company_links() 
Const lnk = "http://fortune.com" 
Dim http As New XMLHTTP60, html As New HTMLDocument 
Dim topic As Object 

With http 
    .Open "GET", "http://fortune.com/fortune500/list/", False 
    .send 
    html.body.innerHTML = .responseText 
End With 

For Each topic In html.getElementsByClassName("small-12 column row") 
    x = x + 1 
    With topic.getElementsByTagName("a") 
     If .Length Then Cells(x, 1) = lnk & Split(.item(0).href, "about:")(1) 
    End With 
Next topic 

Set html = Nothing: Set topics = Nothing 
End Sub 
+0

如果該網站使用ajax加載其餘鏈接。您需要首先使頁面加載這些剩餘鏈接。 – Max08

回答

1

運行在一個新的工作簿下面的代碼。它將輸出到Sheet1的結果,而不管它們是否爲空,所以如果您有數據,請小心。您可以稍後更改這部分代碼。

首先您需要在VBA編輯器中激活Microsoft HTML Object LibraryMicrosoft Internet ControlsTools -> References。然後運行下面的代碼,坐下來放鬆,直到看到「全部完成!」消息:

Sub Company_links() 
    Dim i As Long 
    Dim aIE As InternetExplorer 
    Dim Rank As IHTMLElement, Company As IHTMLElement, Revenues As IHTMLElement 
    Set aIE = New InternetExplorer 
    With aIE 
     .navigate "http://fortune.com/fortune500/list/" 
     .Visible = True 
    End With 

    Do While (aIE.Busy Or aIE.ReadyState <> READYSTATE_COMPLETE) 
     DoEvents 
    Loop 

    For i = 1 To 50 

     On Error Resume Next 
     Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(999) 
     If Rank Is Nothing Then 
      GoTo Skip 
     End If 
     Exit For 
Skip: 
    SendKeys "{end}" 
    Application.Wait (Now() + TimeValue("00:00:005")) 
    Next i 

    With Sheet1 
     .Range("A1") = "RANK" 
     .Range("B1") = "COMPANY" 
     .Range("C1") = "REVENUE" 

     For i = 0 To 999 
      Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(i) 
      Set Company = aIE.document.getElementsByClassName("column small-5 company-title")(i) 
      Set Revenues = aIE.document.getElementsByClassName("column small-5 company-revenue")(i) 
      .Range("A" & i + 2) = Rank.innerText 
      .Range("B" & i + 2) = Company.innerText 
      .Range("C" & i + 2) = Revenues.innerText 
     Next i 

    End With 

    SendKeys "%{F4}" 
    Set aIE = Nothing 
    Set Rank = Nothing 
    Set Company = Nothing 
    Set Revenues= Nothing 
    MsgBox "All Done!" 
End Sub 
+0

謝謝Tehscript,爲您解答。我完成後會回到你身邊。順便說一句,你一直對我很有幫助。 – SIM

+0

@ SMth80沒問題,但我剛剛注意到你想要公司鏈接。你能設法改變這個代碼嗎?因爲這個腳本給出了排名,公司名稱和收入。 – Tehscript

+0

沒問題,我會管理的。你的代碼像往常一樣工作。現在,我會用xmlhttp做一些小抽搐,因爲我有一個想法。萬分感謝。 – SIM

0

如果該網站使用ajax加載其餘鏈接。您需要先讓頁面加載剩餘的鏈接。我的建議是使用硒加載頁面,然後使用您的代碼來獲取鏈接。

http://selenium-python.readthedocs.io/

+0

對不起ANKIT GAUR。我嘗試過硒。那也不能帶來所有的環節。這也帶來了20個鏈接,就像我使用我的第一個代碼一樣。問題在別處。鏈接中必須有一個分頁選項(通常有),但無法弄清楚如何放置。 – SIM

0

我會這樣做。

Option Explicit 

Sub Sample() 
    Dim ie As Object 
    Dim links As Variant, lnk As Variant 
    Dim rowcount As Long 

    Set ie = CreateObject("InternetExplorer.Application") 
    ie.Visible = True 
    ie.navigate "http://fortune.com" 

    'Wait for site to fully load 
    'ie.Navigate2 URL 
    Do While ie.Busy = True 
     DoEvents 
    Loop 

    Set links = ie.document.getElementsByTagName("a") 

    rowcount = 1 

    With Sheets("Sheet1") 
     For Each lnk In links 
     'Debug.Print lnk.innerText 
      'If lnk.classname Like "*Real Statistics Examples Part 1*" Then 
       .Range("A" & rowcount) = lnk.innerText 
       rowcount = rowcount + 1 
       'Exit For 
      'End If 
     Next 
    End With 
End Sub