我在使用PHP解析Google新聞RSS時遇到困難。 XML描述包含大量的混亂,我只需要從它的2個小部分,但我不知道我怎麼能只提取我想要的部分。我一直試圖用PHP preg_macth得到,但我沒有取得成功。PHP從html解析一些代碼
請參閱下面的代碼,我已經在文本中添加了我想要獲得的部分的評論。
PS。很抱歉,這看起來有點亂,但多數民衆贊成在谷歌新聞RSS如何:
<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;">
<tr>
<td width="80" align="center" valign="top">
<font style="font-size:85%;font-family:arial,sans-serif">
<a href="http://">
<!-- i need only this img src only -->
<img src="http://nt3.ggpht.com/news/tbn/ExvkIyaCiPpZwM/6.jpg" /><br />
<!-- /till here -->
<font size="-2">Moneycontrol.com</font></a>
</font>
</td>
<td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br />
<div style="padding-top:0.8em;">
<img alt="" height="1" width="1" /></div>
<div class="lh">
<a href="http://">
<b>Microsoft's Office 365 to take on Google Apps in cloud software race</b>
</a><br />
<font size="-1">
<b><font color="#6f6f6f">Los Angeles Times</font></b>
</font><br />
<font size="-1">
<!----------------- i need only the following text ----------->
Microsoft Corp., the 800-pound gorilla of the software world, is hoping it can lift itself into the cloud. In announcing the general release of Office 365, the online version of its ubiquitous Microsoft Office suite that includes Word, <b>...</b>
<!------------------------- -till here ------------------>
</font><br />
<font size="-1">
<a href="http://">Office 365: Microsoft Pitches Cloud, Eyes Profit</a>
<font size="-1" color="#6f6f6f"><nobr>InformationWeek</nobr></font>
</font>
<br />
<font size="-1">
<a href="http://">Microsoft Battles for Sky Supremacy With Office 365 Launch</a>
<font size="-1" color="#6f6f6f"><nobr>TechNewsWorld</nobr></font>
</font><br />
...
真的感謝您的寶貴時間讀這篇文章,並幫助我。
像往常一樣,不要用正則表達式來分析html(甚至是xml)。使用DOM。 –
這不是XML,也許這是CDATA,無論如何,你可以嘗試http://www.php.net/manual/en/class.simplexmliterator.php – streetparade