2017-05-13 40 views
1

是否有可能使爬蟲在VBA中進行遞歸?我嘗試過使用一段代碼,但一旦找到代碼中虛線標記區域內的行,它將引發錯誤,顯示「參數數量錯誤或屬性賦值無效」。由於我不是VBA的專家,我不能這樣做,但我想可能有什麼方法可以應用。如何讓我的爬蟲做遞歸調用?

Sub NEWAPPS() 
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument 
Dim Items As Object, Item As Object, Newitem As Object, elem As Object 
Dim Z As String 

With http 
    .Open "GET", "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8", False 
    .send 
    html.body.innerHTML = .responseText 
End With 

Set Items = html.getElementsByClassName("left") 
Set Newitem = html.getElementsByClassName("name") 
    For Each Item In Items 
     x = x + 1 
     If Item.getElementsByTagName("h1").Length Then _ 
     Cells(x, 1) = Item.getElementsByTagName("h1")(0).innerText 

     If Item.getElementsByTagName("h2").Length Then _ 
     Cells(x, 2) = Item.getElementsByTagName("h2")(0).innerText 
    Next Item 

    For Each elem In Newitem 
     Z = elem.href 
    '--------------------- 
     NEWAPPS (Z) 
    '--------------------- 
    Next elem 
End Sub 
+0

因爲您沒有在您的sub中聲明Z作爲參數,所以您不能像這樣(帶參數)調用'NEWAPPS(Z)'。你的代碼似乎也有重複的輸出。你想要不同的輸出,並停止沒有更多獨特的輸出?或者你是否想從不同的網址中提取相同的輸出,比如* candy crush saga *,反覆播放? – Tehscript

+0

感謝Tehscript的評論。當然,輸出必須是唯一的。我所做的和你已經指出的嚴重錯誤是,即使我已經能夠讓我的代碼運行,輸出將會一次又一次地被糖果粉碎,因爲主URL是硬編碼的。無論如何,這是一個粗略的草圖,雖然我不知道遞歸是否可能在vba – SIM

回答

1

您可以創建一個遞歸子並從另一個子項調用它。但是,你正在攫取iTunes應用程序,所以它必須是一個巨大的來源,需要很長時間。

爲了跳過訪問相同的URL並避免惡性循環,我使用了字典,並且您正在查找的值存儲在其中以及excel單元格中。

以下是您開始使用的工作代碼。您可能需要根據您需要停止的方式或時間來更改代碼。

Public dict As Object 

Sub NEWAPPS(Z As String) 
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument 
Dim Items As Object, Item As Object, Newitem As Object, elem As Object 

With http 
    .Open "GET", Z, False 
    .send 
    html.body.innerHTML = .responseText 
End With 

Set Newitem = html.getElementsByClassName("name") 
Set Items = html.getElementsByClassName("left") 

If Not dict.Exists(Z) Then 
    dict(Z) = Items(1).innerText 'key is url and value is app name and developer 
    Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = Z 'url 
    Cells(Rows.Count, 2).End(xlUp).Offset(1, 0) = Split(Items(1).innerText, vbLf)(0) 'app name 
    Cells(Rows.Count, 3).End(xlUp).Offset(1, 0) = Split(Items(1).innerText, vbLf)(1) 'developer 
End If 

For Each elem In Newitem 
    If Not dict.Exists(elem.href) Then 'skip visiting same urls and avoid vicious circle 
     NEWAPPS (elem.href) 
    End If 
Next elem 
End Sub 

Sub RecursiveCrawler() 
Set dict = CreateObject("Scripting.Dictionary") 
NEWAPPS ("https://itunes.apple.com/us/app/toy-blast/id890378044?mt=8") 
'###You can get stored keys and values once the scraping is finished. If it ever finishes:)### 
'Dim key As Variant 
'For Each key In dict.Keys 
' Debug.Print key 'url 
' Debug.Print Split(dict(key), vbLf)(0) 'app name 
' Debug.Print Split(dict(key), vbLf)(0) 'developer 
'Next key 
End Sub 
+0

謝謝Tehscript,這是一個我從未夢寐以求的獨特解決方案,直到我看到它的工作。非常不幸的是,我無法按下upvote按鈕數十億次。再次感謝。 – SIM

+0

你很歡迎,我很高興能夠幫到你。 – Tehscript