2013-09-16 261 views
-1

我新手VB 2008.net,我想要做的是從下面的HTML提取HTML元素屬性

<TABLE> 
    <TR> 
    <TD></TD> 
    <TH scope="col">PAT. NO.</TH><TD></TD><TH scope="col">Title</TH> 
    </TR> 
    <TR> 
    <TD valign=top> 
     10 
    </TD> 
    <TD valign=top> 
     <A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>8,519,110</A> 
    </TD> 
    <TD valign=baseline> 
     <IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text"> 
    </TD> 
    <TD valign=top> 
     <A HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>mRNA cap analogs</A> 
    </TD> 

,所以我想我的文本框中顯示以下

/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a 

8,519,110 

mRNA cap analogs 
提取一些元素

上面的HTML標籤被重複使用有更多的錶行,並希望得到所有的人,我已閱讀,我們可以使用「的getAttribute」來獲得html元素,但我想提取特定部分提到以上。

+0

你可能想看看HtmlAgilityPack。 –

回答

1

我已經我一直在使用從HTML表中提取數據的程序 (對不起,沒有信用原作者,我發現這個代碼,不知道它來自哪裏)。它爲表格解析HTML字符串,並將單元格加載到數據集中。

Public Shared Function ConvertHtmlTablesToDataSet(html As String) As DataSet 
    Dim dt As DataTable 
    Dim ds As New DataSet() 
    dt = New DataTable() 
    Dim tableExpression As String = "<table[^>]*>(.*?)</table>" 
    Dim headerExpression As String = "<th[^>]*>(.*?)</th>" 
    Dim rowExpression As String = "<tr[^>]*>(.*?)</tr>" 
    Dim columnExpression As String = "<td[^>]*>(.*?)</td>" 
    Dim headersExist As Boolean = False 
    Dim iCurrentColumn As Integer = 0 
    Dim iCurrentRow As Integer = 0 

    Dim tables As MatchCollection = Regex.Matches(html, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase) 


    For Each table As Match In tables 
     iCurrentRow = 0 
     headersExist = False 
     dt = New DataTable() 

     If table.Value.Contains("<th") Then 
      headersExist = True 

      Dim headers As MatchCollection = Regex.Matches(table.Value, headerExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase) 

      For Each header As Match In headers 
       dt.Columns.Add(header.Groups(1).ToString()) 
      Next 
     Else 

      Dim myvar2222 As Integer = Regex.Matches(Regex.Matches(Regex.Matches(table.Value, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase).Count 

      For iColumns As Integer = 1 To myvar2222 
       dt.Columns.Add("Column " + System.Convert.ToString(iColumns)) 

      Next 
     End If 

     Dim rows As MatchCollection = Regex.Matches(table.Value, rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase) 
     Try 

      For Each row As Match In rows 
       If Not ((iCurrentRow = 0) And headersExist) Then 
        Dim dr As DataRow = dt.NewRow() 
        iCurrentColumn = 0 

        Dim columns As MatchCollection = Regex.Matches(row.Value, columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase) 

        For Each column As Match In columns 
         dr(iCurrentColumn) = column.Groups(1).ToString() 
         iCurrentColumn += 1 
         If iCurrentColumn = dt.Columns.Count Then Exit For 
        Next 

        dt.Rows.Add(dr) 
       End If 
       iCurrentRow += 1 
      Next 

      ds.Tables.Add(dt) 
     Catch ex As Exception 
      Stop 
     End Try 
    Next 

    Return ds 
End Function 
1

不理解你爲什麼要做到這一點,這是一個有點難以給你一個很好的解決方案。

我將提供兩種選擇:

1)VB.NET - 目前還不清楚你是如何在你的HTML設置你的屬性。你應該能夠做到像(注:這是從我VB.net的記憶,在這裏硬編碼的,不使用VS.net):

HTML瀏覽:

<asp:HyperLink id="FirstLink" runat="server" /> 
... 

代碼隱藏

FirstLink.NavigateUrl = yourUrlVariableHere 
... 
YourInputBox.Text = String.Concat(yourUrlVariableHere, yourOtherVariablesHere ...) 

2)jQuery的 -

從本質上講,你希望得到您的屬性,然後DIS發揮他們:

$(function(){ 
    var anchor1 = $("#firstAnchor").attr("href"); 
    var imageSrc = $("#my-image").attr("src"); 

    $("#my-display").html(anchor1+ "<br/>" + imageSrc); 
}); 

全樣本here