2012-03-30 46 views
-6

我有一個頁面,我需要從中提取div的innerhtml。爲了識別div我只有班。提取div的InnerHtml?

<div class="os-box unround"> 
: 
: 
: 
</div> 

我需要提取具有class "os-box unround"在div的innerHTML,假設頁從URL http://abc.com/xyz.html使用C#在頁面加載事件到來。

**Input:** 

<div class="os-box unround"> 

    <div class="os-list" id="os-list-6.1 x64"> 



    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p127116/Atheros+AR5B95+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5B95 Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-09-26</p> <p class="item-version"><span>Version: </span>8.0.0.372</p>  <p class="download"><a href="http://devid.info/p127116/Atheros+AR5B95+Wireless+Network+Adapter">Download</a></p> 

    </div> 



    <div class="adv-box"> 



    </div> 



    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p145532/Atheros+AR5005G+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5005G Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-07-08</p> <p class="item-version"><span>Version: </span>9.0.0.222</p>  <p class="download"><a href="http://devid.info/p145532/Atheros+AR5005G+Wireless+Network+Adapter">Download</a></p> 

    </div> 





    <div class="item-box"> 

     <p class="item-title"><a href="http://devid.info/en/p134802/Atheros+AR5008X+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5008X Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p> 

     <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p> 

     <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p> 

    <p class="item-date"><span>Driver Date: </span>2010-06-24</p> <p class="item-version"><span>Version: </span>9.0.0.208</p>  <p class="download"><a href="http://devid.info/p134802/Atheros+AR5008X+Wireless+Network+Adapter">Download</a></p> 

    </div> 

</div> 
<div> 

一些網址,說http://abc.com/xyz.html有這樣的html從上面說div。我想閱讀它並在我自己的頁面上顯示其頁面加載事件。

輸出;

包含os-box非圓div的內部html的字符串。

+0

ehhh這是使用JavaScript? – Neal 2012-03-30 18:46:53

+0

嘗試[jQuery類選擇器](http://api.jquery.com/class-selector/) – 2012-03-30 18:47:33

+0

他沒有提及或標記爲jquery,雖然 – Rodolfo 2012-03-30 18:48:11

回答

1

您是否試過HtmlAgilityPack?它將允許您解析和查詢(使用XPATH)很多您找到的格式錯誤的HTML。

如果我正確理解你的問題,你可以使用:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); 
HtmlAgilityPack.HtmlDocument doc = web.Load("http://abc.com/xyz.html"); 

HtmlAgilityPack.HtmlNode div = doc.DocumentNode 
    .SelectSingleNode("/html/body/div[@class=\"os-box unround\"]"); 
string contentYouWantedToDisplayOnYourOwnPage = div.InnerHtml;