2013-12-14 26 views
1

我試圖創建一個簡單的應用程序,它基本上是用來在幾個網站比較的東西。我已經看到了一些將所有文本提取到應用程序的方法。但有沒有什麼方法可以提取說,只有標題和說明。刮從網站的特定文字應用VB的

拿一本書,網站作爲一個例子。無論如何搜索書籍標題,然後顯示所有不同的評論,簡介,價格,而沒有任何不友好的文本呢?

回答

0

一個快速而簡單的解決方案是使用WebBrowser,通過它的.Document屬性公開HtmlDocument

Public Class Form1 

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click 
     Me.WebBrowser1.ScriptErrorsSuppressed = True 
     Me.WebBrowser1.Navigate(New Uri("http://stackoverflow.com/")) 
    End Sub 

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted 

     Dim document As HtmlDocument = Me.WebBrowser1.Document 
     Dim title As String = Me.GetTitle(document) 
     Dim description As String = Me.GetMeta(document, "description") 
     Dim keywords As String = Me.GetMeta(document, "keywords") 
     Dim author As String = Me.GetMeta(document, "author") 

    End Sub 

    Private Function GetTitle(document As HtmlDocument) As String 
     Dim head As HtmlElement = Me.GetHead(document) 
     If (Not head Is Nothing) Then 
      For Each el As HtmlElement In head.GetElementsByTagName("title") 
       Return el.InnerText 
      Next 
     End If 
     Return String.Empty 
    End Function 

    Private Function GetMeta(document As HtmlDocument, name As String) As String 
     Dim head As HtmlElement = Me.GetHead(document) 
     If (Not head Is Nothing) Then 
      For Each el As HtmlElement In head.GetElementsByTagName("meta") 
       If (String.Compare(el.GetAttribute("name"), name, True) = 0) Then 
        Return el.GetAttribute("content") 
       End If 
      Next 
     End If 
     Return String.Empty 
    End Function 

    Private Function GetHead(document As HtmlDocument) As HtmlElement 
     For Each el As HtmlElement In document.GetElementsByTagName("head") 
      Return el 
     Next 
     Return Nothing 
    End Function 

End Class