希望這是一個非常簡單的解決方案,我是PHP的新手,所以我可能會錯過顯而易見的東西。我正在用ScraperWiki構建一個刮板(雖然這是PHP的一個問題,與SW無關)。代碼如下:當變量未設置時,PHP ISSET函數仍在運行
<?php
require 'scraperwiki/simple_html_dom.php';
$allLinks = array();
function nextPage($nextUrl, $y)
{
getLinks($nextUrl, $y);
}
function getLinks($url) // gets links from product list page
{
global $allLinks;
$html_content = scraperwiki::scrape($url);
$html = str_get_html($html_content);
if (isset($y)) {
$x = $y;
} else {
$x = 0;
}
foreach ($html->find("div.views-row a.imagecache-product_list") as $el) {
$url = $el->href . "\n";
$allLinks[$x] = 'http://www.foo.com';
$allLinks[$x] .= $url;
$x++;
}
$next = $html->find("li.pager-next a", 0)->href . "\n";
print_r("Printing $next:");
print_r($next);
if (isset($next)) {
$nextUrl = 'http://www.foo.com';
$nextUrl .= $next;
print_r($nextUrl);
$y = $x;
print_r("Printing X:");
print_r($x);
print_r("Printing Y:");
print_r($y);
nextPage($nextUrl, $y);
} else {
return;
}
}
getLinks("http://www.foo.com/department/accessories");
print_r($allLinks);
?>
期望的輸出:腳本應該刮所有從第一頁的鏈接,找到「下一頁」按鈕,颳去其URL鏈接,找到「下頁「來自該URL等,等等。當沒有更多的「下一頁」鏈接時,它應該停止。
當前輸出:代碼運行正常,但它不會停止時,它應該。這裏的關鍵是線:
$next = $html->find("li.pager-next a", 0)->href . "\n";
if (isset($next)) { }
我只希望「下一頁()」函數來運行,如果存在頁面上的li.pager-next a
。下面是從控制檯輸出:
http://www.foo.com/department/accessories?page=1
http://www.foo.com/department/accessories?page=2
http://www.foo.com/department/accessories?page=3
http://www.foo.com/department/accessories?page=4
http://www.foo.com/department/accessories?page=5
http://www.foo.com/department/accessories?page=6
http://www.foo.com/department/accessories?page=7
http://www.foo.com/department/accessories?page=8
http://www.foo.com/department/accessories?page=9
http://www.foo.com/department/accessories?page=10
PHP Notice: Trying to get property of non-object in /home/scriptrunner/script.php on line 31
// THE LOOP SHOULD BREAK HERE BUT DOESN'T
http://www.foo.com
http://www.foo.com/home?page=1
http://www.foo.com/home?page=2
http://www.foo.com/home?page=3
http://www.foo.com/home?page=4
http://www.foo.com/home?page=5
http://www.foo.com/home?page=6
http://www.foo.com/home?page=7
$未來= $ HTML的「發現( 」li.pager-旁邊一「,0) - > HREF 。 「\ n」 個;至少將是「\ n」,因此它將被設置。 – mpm 2013-02-22 23:56:40