所以,我有這個真棒網絡爬蟲代碼。它從所述站點獲取請求的數據並粘貼與其關聯的鏈接。 (好男孩)限制從網絡爬蟲提取的行
現在的問題是,如何限制提取的數據說5行。 我試圖把「LIMIT 5」(即我們通常做的PHP SQL查詢),但它沒有工作..
我的代碼去如下::
<div class="news-entry">
<div class="newsblock">
<div style="clear:both"></div>
<h2>
<a rel="nofollow" target="_blank" href="http://www.usmle-forums.com/usmle-step-3-forum/">
USMLE-Forums :: STEP-3
</a>
</h2>
<ul>
<?php
function get_datafour($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$returned_content = get_datafour('http://www.usmle-forums.com/usmle-step-3-forum/');
$first_step = explode('<tbody id="threadbits_forum_30">' , $returned_content);
$second_step = explode('</tbody>', $first_step[1]);
$third_step = explode('<tr>', $second_step[0]);
// print_r($third_step);
foreach ($third_step as $element) {
$child_first = explode('<td class="alt1"' , $element);
$child_second = explode('</td>' , $child_first[1]);
$child_third = explode('<a href=' , $child_second[0]);
$child_fourth = explode('</a>' , $child_third[1]);
$final = "<a href=".$child_fourth[0]."</a></br>";
?>
<li target="_blank" class="itemtitle">
<span class="item_new"></span><?php echo $final?>
</li>
<?php
}
?>
</ul>
<div style="clear:both"></div>
</div>
</div>
任何建議都感激..
第五屆結果後
5次迭代後在foreach循環中中斷 –
以及如何做? – harishk
看到Manvir singh –