2016-01-18 22 views
0

所以我嘗試從多個網站抓取數據到excel。我認爲代碼在理論上運行良好,雖然我有「對象變量或塊變量未設置錯誤」。網絡抓取vba「對象變量或塊變量未設置錯誤」和瀏覽器崩潰

我面對的第一個問題是,有時它會從2到10完美循環並插入每一位數據,但有時會出現錯誤,並看到只插入1或2行數據。我真的無法弄清楚現在可能是什麼原因。

其次,這是某種演示代碼。我只使用一小批數據並循環播放。我的真正目標是找到一種方法,可以在不使計算機或瀏覽器崩潰的情況下抓取多達100行的網頁。如果我將我的代碼轉換爲XMLHTTP類型的刮取,會更好嗎?如果是這樣,我該怎麼做。

由於提前

Private Sub CommandButton1_Click() 

    Dim ie As Object 
    Dim iexp As Object 
    Dim firstname(1 To 10), lastname(1 To 10) As Variant 
    Dim mm(1 To 10), dd(1 To 10), yyyy(1 To 10) As Integer 
    Dim PhoneNumber(1 To 10) As Variant 
    Dim Address(1 To 10) As Variant 
    Dim HomeValue(1 To 10) As Variant 


    Dim i As Integer 



     For i = 2 To 10 

     'get variables from excel sheet1 and search on peoplefinders.com 

       firstname(i) = Sheet1.Cells(i, 1).Value 
       lastname(i) = Sheet1.Cells(i, 2).Value 
       mm(i) = Sheet1.Cells(i, 3).Value 
       dd(i) = Sheet1.Cells(i, 4).Value 
       yyyy(i) = Sheet1.Cells(i, 5).Value 

       Set ie = CreateObject("InternetExplorer.Application") 
       ie.Visible = True 
       ie.Height = 1000 
       ie.Width = 1000 
       ie.navigate ("http://www.peoplefinders.com/peoplesearch/searchresults?search=People&fn=" & firstname(i) & "&mn=&ln=" & lastname(i) & "&city=&state=&age=&dobmm=" & mm(i) & "&dobdd=" & dd(i) & "&doby=" & yyyy(i)) 

       Do While ie.Busy: DoEvents: Loop 
       Dim Doc As HTMLDocument 

       Set Doc = ie.document 

      'get elements and insert into cells in sheet 1 

       PhoneNumber(i) = Doc.getElementsByTagName("td")(2).getElementsByTagName("a")(0).innerText 
       Address(i) = Doc.getElementsByTagName("td")(1).getElementsByTagName("a")(0).innerText 

       Sheet1.Cells(i, 6).Value = PhoneNumber(i) 
       Sheet1.Cells(i, 7).Value = Address(i) 

      'modify address for next search 

       a = Split(Address(i), " ") 
       b = Join(a, "-") 

      'search home value on zillow.com 

       Set iexp = CreateObject("InternetExplorer.Application") 
       iexp.Visible = True 
       iexp.Height = 1000 
       iexp.Width = 1000 
       iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/") 

       Do While iexp.Busy: DoEvents: Loop 
       Dim Doc2 As HTMLDocument 

       Set Doc2 = iexp.document 

       iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/") 

      'insert home value into cells in sheet 1 

       HomeValue(i) = Doc2.getElementsByClassName("home-summary-row")(1).getElementsByTagName("span")(1).innerText 

       Sheet1.Cells(i, 8).Value = HomeValue(i) 



     Next 


    End Sub 
+0

檢查就緒狀態以及忙碌,並在每次瀏覽時間。你導航檢查,然後再次導航,並沒有得到文件第二次 –

+0

我相信你遇到的時機問題有時反應尚未完成。是的,使用XMLDOC或IE。我所做的是等待國家= 4;通過'lSize = Len(IE.Document.body.innerhtml)'檢查響應的長度以查看是否低於好的回報;我搜索一個已知的值,如果沒有找到,請等待一秒鐘,然後重試。你沒有提到錯誤發生在哪裏? –

回答

0

正如凱雷姆圖爾古特盧尖歐,檢查是否忙通常是不夠的,你必須要檢查的readyState爲好。下面是我如何做到這一點:

Sub WaitBrowser(browser As Object) 
    Do While browser.Busy 
      DoEvents 
    Loop 
    Do While browser.readyState <> 4 
      DoEvents 
    Loop 
End Sub 

我那麼每個導航後調用WaitBrowser IE(其中IE是我InternetExplorer.Application對象),與文檔元素插手之前。對於其他方法,爲了提高效率和可預測性,我更願意直接使用API​​發送HTTP消息(我通常使用WinHTTP,但也可能使用XMLHTTP或winInet,我相信)。兩個值得注意的例外:1)我想在處理過程中或處理後將用戶引導至瀏覽器,或者2)複雜的腳本涉及找出要發送的數據以獲取所需的數據(在這種情況下,讓瀏覽器更容易工作)。

這是改編自最近的一個項目爲例(沒有錯誤檢查):

Function FindLink() As String 
    Dim Request as Object 
    Set Request = CreateObject("WinHttp.WinHttpRequest.5.1") 
    Request.Open "GET", "http://example.com/pagewithinfo" 
    Request.Send 
    Dim resp as String 
    resp = Request.ResponseText 
    'create html tree with response 
    Dim h As Object 
    Set h = CreateObject("htmlfile") 
    h.body.innerHTML = respA 
    'get the info 
    FindLink = h.DocumentElement.GetElementsByTagName("a")(0).GetAttribute("href") 
    Set h = Nothing 
    Set Request = Nothing 
End Function 
相關問題