所以我試圖將以下數據解析爲CSV。從我的閱讀中,聽起來像是使用HAP的最佳方式,因爲它具有強大的解析器。HTML Scrape Table to CSV(使用HAP?)
dynamic doc = this.wbControl.Document;
內容
<div class="content">
<fieldset>
<ul class="fieldsetr">
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">Sender:</em>
</div>
</div>
<div>
<div class="clip">
<em>[email protected]</em>
</div>
</div>
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">Recipient:</em>
</div>
</div>
<div>
<div class="clip">
<em>[email protected]</em>
</div>
</div>
</li>
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">Message ID:</em>
</div>
</div>
<div>
<div class="clip">
<em>2342342345235</em>
</div>
</div>
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">Message size:</em>
</div>
</div>
<div>
<div class="clip">
<em>18.74 KB
</em>
</div>
</div>
</li>
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">Date and time received:</em>
</div>
</div>
<div>
<div class="clip">
<em>11/27/2012 6:17:22 AM</em>
</div>
</div>
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">Date and time filtered:</em>
</div>
</div>
<div>
<div class="clip">
<em>11/27/2012 6:17:22 AM</em>
</div>
</div>
</li>
<li class="row medium">
<!-- Connector Details -->
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">First delivery attempt:</em>
</div>
</div>
<div>
<div class="clip">
<em>11/27/2012 6:17:23 AM</em>
</div>
</div>
</li>
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">Final delivery attempt:</em>
</div>
</div>
<div>
<div class="clip">
<em>11/27/2012 6:17:23 AM</em>
</div>
</div>
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">From IP address:</em>
</div>
</div>
<div>
<div class="clip">
<em>1.2.3.4 <unknown></em>
</div>
</div>
</li>
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">To IP address:</em>
</div>
</div>
<div>
<div class="clip">
<em>4.3.2.1 <mail.example2.com> </em>
</div>
</div>
</li>
<li class="row medium alt">
<div class="field">
<div class="shell">
<em class="disable">Filtering results:</em>
</div>
</div>
<div>
<div class="clip">
<em>Passed Filtering</em>
</div>
</div>
</li>
<li class="row medium">
<div class="field">
<div class="shell">
<em class="disable">Delivery result:</em>
</div>
</div>
<div>
<div class="clip">
<span><em>Delivered: 470 2.4.0 <2342342345235> [InternalId=2321233] Queued mail for delivery</em></span>
</div>
</div>
</li>
</ul>
</fieldset>
</div>
什麼是我該數據轉換的最佳方式:
截至目前,在WPF WebBrowser控件的內容正在被訪問?這只是一個記錄,但會添加更多記錄。
編輯
結束了使用下面的代碼來測試它:
HtmlAgilityPack.HtmlDocument docHAP = new HtmlAgilityPack.HtmlDocument();
docHAP.LoadHtml(doc.Body.InnerHtml.ToString());
foreach(HtmlNode emNode in docHAP.DocumentNode.SelectNodes("//em"))
{
MessageBox.Show(emNode.InnerText.ToString());
}
如果任何人有一個更有效的解決方案,請隨時讓我知道。
那麼你打了XPATH和XSLT的東西的頭。那部分總是讓我感動。你能否提供一個例子,甚至只有一個部分以上? – lordzero
看到我編輯的答案。希望這可以幫助! –
這當然是指向正確的方向。雖然this.wbControl.Document是一個mshtml文檔,但LoadHtml方法不起作用。試圖找出如何將其轉換爲HAP HtmlDocument atm以進一步測試。也就是說,它也在DocumentElement上拋出了一個錯誤,但是好像它會取代DocumentNode。 – lordzero