我是新來的PHP,並試圖從網站上刮取數據我使用正則表達式,但在div中查找內容出租和詳細信息是一個問題,這裏是我的代碼。有人可以幫我嗎?刮和div
<?php
header('content-type: text/plain');
$contents= file_get_contents('http://www.hassconsult.co.ke/index.php?option=com_content&view=article&id=22&Itemid=29');
$contents = preg_replace('/\s(1,)/','',$contents);
$contents = preg_replace('/ /','',$contents);
//print $contents."\n";
$records = preg_split('/<span class="style8"/',$contents);
for ($ix=1; $ix < count($records); $ix++){
$tmp = $records[$ix];
preg_match('/href="(.*?)"/',$tmp, $match_url);
preg_match('/>(.*?)<\/span>/',$tmp,$match_name);
preg_match('/<div[^>]+class ?= ?"style10"[^>]*>(\s*(<div.*(?2).*<\/div>\s*)*)<\/div>/Us',$tmp,$match_rental);//error is here
print_r($match_url);
print_r($match_name);
print_r($match_rental);
print $tmp."\n";
exit();
}
//print count($records)."\n";
//print_r($records);
//if ($contents===false)
//print 'FALSE';
//print_r(htmlentities($contents));
?>
這裏是內容
>HILLVIEW CROSSROADS4 BED HOUSE</span></div></td>
</tr>
<tr>
<td width="57%" style="padding-left:20px;"><div align="left" class="style10" style="color:#007AC7;">
<div align="left">
Rental;
USD 4,500
</div>
</div></td>
<td width="43%" align="right"><div align="right" class="style10" style="color:#007AC7;">
<div align="right">
No.
834
</div>
</div></td>
</tr>
<tr>
<td colspan="2" style="padding-left:20px;color:#000000;">
<div align="justify" style="font-family:Arial, Helvetica, sans-serif;font-size:11px;color:333300;">Artistically designed 4 bed (all
ensuite) house on half acre of well-tended gardens. Lounge with fireplace opening to terrace, opulent master suite, family room, study. Good finishes, SQ, carport, extra water storage
and generator. <a href="/index.php?option=com_content&view=article&id=27&Itemid=74&send=5&ref_no=834/II&t=2">....Details</a> </div></td>
</tr>
</table></td>
</tr>
</table>
<br>
爲什麼你使用正則表達式來解析HTML? PHP有多個可用的HTML解析器,它可以處理所有類型的正則表達式不能使用的東西。 HTML解析器知道哪些構造在HTML和XHTML的哪些版本中是有效的,並且使用doctype來確定該頁面正在使用哪個版本。 – 2012-03-24 19:58:46
請給我鏈接到一個教程將高度讚賞我有點新 – user1207576 2012-03-25 05:15:36