2013-07-22 39 views
0

HTML我颳了下面。它包含一個崗位和2篇:xpath和htmlagilitypack通過類似節點迭代

<div class="share_buttons noprint">...</div> 

<strong>Dan</strong> Says:<br/> 
<span class="small soft"><time datetime="2009-10-05T02:27:38Z">Sun, Oct 04 '09, 7:27 PM</time></span> 
<div class="quote_top">&nbsp;</div> 
<div class="quote_item">Hello all, this is my original post.<br/></div> 

<form class="action_heading noprint"> 
<strong>Page</strong> 
... 
</form> 

<div class="post_number" id="r_140626">1</div> 
<strong>AnnieMae</strong> Says:<br/> 
<span class="small soft"><time datetime="2009-10-05T02:30:27Z">Sun, Oct 04 '09, 7:30 PM</time></span> 
<div class="quote_top clear_float">&nbsp;</div> 
<div class="quote_item">What do you think of it?<br/></div> 

<div class="post_number" id="r_140627">2</div> 
<strong>Thomas77</strong> Says:<br/> 
<span class="small soft"><time datetime="2009-10-05T02:32:32Z">Sun, Oct 04 '09, 7:32 PM</time></span> 
<div class="quote_top clear_float">&nbsp;</div> 
<div class="quote_item">Not really sure, can't see this pic?<br/> 
</div> 

所以我已經想通了如何獲得原來的職位...

'get AUTHOR and DATE of original post 
Dim divOriginalPostAuthor As HtmlNode = threadDoc.DocumentNode.SelectSingleNode("//div[@class='share_buttons noprint']/following-sibling::strong") 
Dim divOriginalPostDate As HtmlNode = threadDoc.DocumentNode.SelectSingleNode("//div[@class='share_buttons noprint']/following-sibling::span/time") 

Dim strDate As String = divOriginalPostDate.InnerText.Trim 
strDate = strDate.Remove(0, InStr(strDate, ", ")).Trim 
strDate = Replace(strDate, "'", 20) 
Dim strAuthor As String = (divOriginalPostAuthor.InnerText).Trim 
dtPosted = CDate(strDate) 
divOriginalPostText = threadDoc.DocumentNode.SelectSingleNode("//div[@class='share_buttons noprint']/following-sibling::div[@class='quote_item']") 

現在,我只是想弄清楚如何獲得答覆......我想獲取當前線的位置是這樣的:

Dim currentNodePosition As Integer = threadDoc.DocumentNode.SelectSingleNode("//form[@class='action_heading noprint']").Line 

,然後使用該通過回覆進行迭代,因爲我增加當前線位置。這對我來說這很棘手的想法是,答覆沒有「容器」HTML元素,我一次收集....任何想法?

回答

0

只是爲了記錄,我想到了這一點,並希望在未來發布任何需要它的人的答案。

'then get thread replies 
Dim nodesPostNumber As HtmlNodeCollection = threadDoc.DocumentNode.SelectNodes("//form[@class='action_heading noprint']/following-sibling::div[contains(@id, 'r_')]") 
Dim replies As New List(Of ThreadReply) 

If Not nodesPostNumber Is Nothing Then 

Dim intNumberOfReplies As Integer = nodesPostNumber.Count 
For i = 1 To intNumberOfReplies 
    Dim nodeReplyDate As HtmlNode = threadDoc.DocumentNode.SelectSingleNode("//form[@class='action_heading noprint']/following-sibling::span[@class='small soft' and position()=" + i.ToString + "]") 
    Dim strXPathForDate As String = nodeReplyDate.XPath 
    Dim strReplyText As String = threadDoc.DocumentNode.SelectSingleNode(strXPathForDate + "/following-sibling::div[@class='quote_item']").InnerHtml 
    strReplyText = Left(strReplyText, InStr(strReplyText, "<div class=""noprint""") - 1) 
    Dim strReplyAuthor As String = threadDoc.DocumentNode.SelectSingleNode(nodeReplyDate.XPath + "/preceding-sibling::strong").InnerText 
    Dim strReplyDate As String = nodeReplyDate.InnerText.Trim 
    strReplyDate = strReplyDate.Remove(0, InStr(strReplyDate, ", ")).Trim 
    strReplyDate = Replace(strReplyDate, "'", 20) 
    strReplyDate = Replace(strReplyDate, "via mobile", "") 
    Dim thisReply As New ThreadReply With {.Author = strReplyAuthor, .DatePosted = strReplyDate, .ThreadID = thisThread.ThreadID, .Text = strReplyText} 
    replies.Add(thisReply) 
Next 
End If 

所以,它是關於「抓住」這是用於1個回覆節點和XPath中再次使用它,以確保你只能得到你來抓住該節點後回覆。我通過使用HTMLNode.Xpath來完成這項工作,它爲您提供任何給定HTMLAgilityPack.html節點的xpath字符串,然後添加「/ following-sibling」。