2013-08-25 50 views
0

VB.net路段提取我能提取與簡單的HREF標記像這樣的網址:與HtmlAgilityPack

<a href="http://www.samplesite.com"> 

但我的問題是我如何提取從href標記,看起來像這樣的鏈接?

<a href="http://www.wherecreativitygoestoschool.com/vancouver/left_right/rb_test.htm" onmousedown="return rwt(this,'','','','1','AFQjCNHvlwTxfBVEYcqGUnilAZN0uY2IXw','','0CCsQFjAA','','',event)"> 
Right Brain vs Left Brain Creativity <em>Test</em> at The Art Institute of <b>...</b></a> 

這裏是我的完整代碼:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click 
      Dim webClient As New System.Net.WebClient 
      Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA") 

     Dim doc = New HtmlAgilityPack.HtmlDocument() 
      doc.LoadHtml(WebSource) 
      Dim links = GetLinks(doc, "test") 
      For Each Link In links 
       ListBox1.Items.Add(Link.ToString()) 
      Next 
     End Sub 


     Public Class Link 
      Public Sub New(Uri As Uri, Text As String) 
       Me.Uri = Uri 
       Me.Text = Text 
      End Sub 
      Public Property Text As String 
      Public Property Uri As Uri 

      Public Overrides Function ToString() As String 
       Return String.Format(If(Uri Is Nothing, "", Uri.ToString())) 
      End Function 
     End Class 


     Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link) 
      Dim uri As Uri = Nothing 
      Dim linksOnPage = From link In doc.DocumentNode.Descendants() 
           Where link.Name = "a" _ 
           AndAlso link.Attributes("href") IsNot Nothing _ 
           Let text = link.InnerText.Trim() 
           Let url = link.Attributes("href").Value 
           Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _ 
           AndAlso uri.TryCreate(url, UriKind.Absolute, uri) 

      Dim Uris As New List(Of Link)() 
      For Each link In linksOnPage 
       Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text)) 
      Next 

      Return Uris 
     End Function 

我注意到,我的代碼不會提取與</a>結束鏈接。有什麼我可以做的修改我的代碼,它會提取以</a>結尾的鏈接?

回答

0

使用下面的代碼來獲取所有具有從頁 'HREF' 屬性鏈接:

Dim hNodeCol as HTMLNodeCollection = doc.DocumentNode.SelectNodes("//a[@href]") 

...如果你還用得着着,當然;)