你好,我正在嘗試爲Craiglist構建一個webscrapper。下面的代碼基於我正在嘗試做的很好。問題是我正在使用webrowser控件。我想傳遞更多的URL來解析數據。這意味着我將有一個列表說100個URL,但基於webrowser我不確定 我可以做我想做的。VB.NET HTML循環
我看着WebRequest,但如果我做webrequest,似乎我將不得不解析數據,好像它是一個文本文件,而不是一個HTML,我無法獲得HTML的屬性下面的方式。任何幫助都會很棒。
Private Sub btnGetData_Click(sender As Object, e As EventArgs) Handles btnGetData.Click
clsScrape.ScrapeHTML(WebBrowser1, dgvData, "http://newyork.craigslist.org")
End Sub
Public Shared Sub ScrapeHTML(ByVal webBrows As WebBrowser, ByRef DataGridView1 As DataGridView, ByVal strCityLink As String)
'Change list box to datagridview to add rows. Will be passing multiple cities
For Each element As HtmlElement In webBrows.Document.All
Dim WebDate As String = ""
If element.GetAttribute("className") = "result-info" Then
'loop though the children element
For Each child As HtmlElement In element.Children
'if the dat is today capture loop else exit
If child.GetAttribute("className") = "result-date" Then
If child.InnerHtml = "Dec 30" Then
WebDate = child.InnerHtml
Else
Exit For
End If
End If
If child.GetAttribute("className") = "result-title hdrlnk" Then
Dim input As String = child.OuterHtml
Dim result As String() = input.Split("""")
Dim link As String = strCityLink & result(3)
Dim Title As String = child.InnerHtml
DataGridView1.Rows.Add(New String() {WebDate, Title, link})
End If
Next
End If
Next
End Sub
你知道這是違反Craiglist,他們皺着眉頭。因此,爲什麼***他們有一個API可以使用***來獲取這些東西。 – Codexer
他們呢?我唯一看到的是RSS提要? – CodeMonger
你會比html更好地檢查RSS feed。 –