2013-10-11 94 views
1

我試圖找到一種方式來獲得yelp.com獲取一個網站從列表數據到Excel VBA

我對其中有幾個關鍵詞和位置的電子表格中的數據。我正在尋找基於這些關鍵字和位置已經在我的電子表格中提取yelp列表中的數據。

我已經創建了下面的代碼,但它似乎得到荒謬的數據,而不是我正在尋找的確切信息。

我想獲得商家名稱,地址和電話號碼,但我所得到的只是一無所獲。如果有人能幫我解決這個問題。

Sub find() 

Dim ie As Object 
    Set ie = CreateObject("InternetExplorer.Application") 
    With ie 
     ie.Visible = False 
     ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10" 
     ' Don't show window 
    ie.Visible = False 

    'Wait until IE is done loading page 
    Do While ie.Busy 
     Application.StatusBar = "Downloading information, lease wait..." 
     DoEvents 
    Loop 

    ' Make a string from IE content 
    Set mDoc = ie.Document 
    peopleData = mDoc.body.innerText 
    ActiveSheet.Cells(1, 1).Value = peopleData 
End With 

peopleData = "" 'Nothing 
Set mDoc = Nothing 
End Sub 
+0

有你有機會嘗試一下我的答案??? –

回答

5

如果你右擊在IE,並做View Source,顯而易見的是,在網站上提供的數據是不是文檔的.Body.innerText財產的一部分。我注意到動態提供的數據通常會出現這種情況,而且這種方法對於大多數網絡抓取來說太簡單了。

我在Google Chrome中打開它並檢查元素,以瞭解我真正在尋找什麼,以及如何使用DOM/HTML解析器找到它;您將需要添加對Microsoft HTML對象庫的引用。

enter image description here

我認爲你可以得到它的返回<DIV>標籤的集合,然後檢查那些類名與環內的If statment。

我做了一些修改,以我原來的答覆,這應該打印每條記錄在一個新的小區:

Option Explicit 
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 
Sub find() 
'Uses late binding, or add reference to Microsoft HTML Object Library 
' and change variable Types to use intellisense 
Dim ie As Object 'InternetExplorer.Application 
Dim html As Object 'HTMLDocument 
Dim Listings As Object 'IHTMLElementCollection 
Dim l As Object 'IHTMLElement 
Dim r As Long 
    Set ie = CreateObject("InternetExplorer.Application") 
    With ie 
     .Visible = False 
     .Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10" 
     ' Don't show window 
     'Wait until IE is done loading page 
     Do While .readyState <> 4 
      Application.StatusBar = "Downloading information, Please wait..." 
      DoEvents 
      Sleep 200 
     Loop 
     Set html = .Document 
    End With 
    Set Listings = html.getElementsByTagName("LI") ' ## returns the list 
    For Each l In Listings 
     '## make sure this list item looks like the listings Div Class: 
     ' then, build the string to put in your cell 
     If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then 
      Range("A1").Offset(r, 0).Value = l.innerText 
      r = r + 1 
     End If 
    Next 

Set html = Nothing 
Set ie = Nothing 
End Sub 
+1

這是一個[忙碌的等待循環](http://stackoverflow.com/a/19019200/1768303),如果處理'ie_DocumentComplete'是不可能的,考慮在裏面加入'Sleep(delay)'。 – Noseratio

+0

@Noseratio我剛剛注意到,實際上,並將循環更改爲'Do While .readyState <> 4',也對代碼做了一些調整以成爲完美的解決方案。 –

+0

嗯,我沒有看到變化。我的意思是像'DoEvents:Sleep(200)'(如果這是VBA,首先需要聲明子睡眠庫「kernel32」Alias「Sleep」(ByVal dwMilliseconds As Long)'),所以它不僅僅是在等待時吃掉CPU。一般來說,'DoEvents'可能導致重入問題,這裏有一個很好的解釋[爲什麼](http://stackoverflow.com/a/5183623/1768303)。 – Noseratio