2013-02-22 89 views
0

希望這是一個非常簡單的解決方案,我是PHP的新手,所以我可能會錯過顯而易見的東西。我正在用ScraperWiki構建一個刮板(雖然這是PHP的一個問題,與SW無關)。代碼如下:當變量未設置時,PHP ISSET函數仍在運行

<?php 
require 'scraperwiki/simple_html_dom.php'; 

$allLinks = array(); 

function nextPage($nextUrl, $y) 
{ 
    getLinks($nextUrl, $y);  
} 

function getLinks($url) // gets links from product list page 
{ 
    global $allLinks; 
    $html_content = scraperwiki::scrape($url); 
    $html   = str_get_html($html_content); 

    if (isset($y)) { 
     $x = $y; 
    } else { 
     $x = 0; 
    } 

    foreach ($html->find("div.views-row a.imagecache-product_list") as $el) { 
     $url   = $el->href . "\n"; 
     $allLinks[$x] = 'http://www.foo.com'; 
     $allLinks[$x] .= $url; 
     $x++; 
    } 

    $next = $html->find("li.pager-next a", 0)->href . "\n"; 
    print_r("Printing $next:"); 
    print_r($next); 

    if (isset($next)) { 
     $nextUrl = 'http://www.foo.com'; 
     $nextUrl .= $next; 
     print_r($nextUrl); 
     $y = $x; 
     print_r("Printing X:"); 
     print_r($x); 
     print_r("Printing Y:"); 
     print_r($y); 

     nextPage($nextUrl, $y); 
    } else { 
     return; 
    } 

} 

getLinks("http://www.foo.com/department/accessories"); 

print_r($allLinks); 

?> 

期望的輸出:腳本應該刮所有從第一頁的鏈接,找到「下一頁」按鈕,颳去其URL鏈接,找到「下頁「來自該URL等,等等。當沒有更多的「下一頁」鏈接時,它應該停止。

當前輸出:代碼運行正常,但它不會停止時,它應該。這裏的關鍵是線:

$next = $html->find("li.pager-next a", 0)->href . "\n"; 
if (isset($next)) { } 

我只希望「下一頁()」函數來運行,如果存在頁面上的li.pager-next a。下面是從控制檯輸出:

 http://www.foo.com/department/accessories?page=1 
     http://www.foo.com/department/accessories?page=2 
     http://www.foo.com/department/accessories?page=3 
     http://www.foo.com/department/accessories?page=4 
     http://www.foo.com/department/accessories?page=5 
     http://www.foo.com/department/accessories?page=6 
     http://www.foo.com/department/accessories?page=7 
     http://www.foo.com/department/accessories?page=8 
     http://www.foo.com/department/accessories?page=9 
     http://www.foo.com/department/accessories?page=10 

    PHP Notice: Trying to get property of non-object in /home/scriptrunner/script.php on line 31 
// THE LOOP SHOULD BREAK HERE BUT DOESN'T 

     http://www.foo.com 
     http://www.foo.com/home?page=1 
     http://www.foo.com/home?page=2 
     http://www.foo.com/home?page=3 
     http://www.foo.com/home?page=4 
     http://www.foo.com/home?page=5 
     http://www.foo.com/home?page=6 
     http://www.foo.com/home?page=7 
+2

$未來= $ HTML的「發現( 」li.pager-旁邊一「,0) - > HREF 。 「\ n」 個;至少將是「\ n」,因此它將被設置。 – mpm 2013-02-22 23:56:40

回答

1

這個怎麼樣:

$next = $html->find("li.pager-next a", 0); 

if (isset($next)) { 
    $nextUrl = 'http://www.foo.com'; 
    $nextUrl .= $next->href; // move ->href here 
    print_r($nextUrl . "\n"); // put \n here since we don't actually want that char in the url 
    $y = $x; 
    print_r("Printing X:"); 
    print_r($x); 
    print_r("Printing Y:"); 
    print_r($y); 

    nextPage($nextUrl, $y); 
} else { 
    return; 
} 
+0

這太簡單了,它讓我頭痛。正如我所說的,PHP新手假設\ n不會影響輸出,如果find()返回null! – Jascination 2013-02-23 00:04:23

0

無論值由

返回它不會導致isset($next)爲您附加"\n"時返回false它。

使用這樣的事情:

$nextElement = $html->find("li.pager-next a", 0); 

if(isset($nextElement)) 
{ 
    $nextUrl = 'http://www.foo.com' . $nextElement->href . PHP_EOL; 

    print_r($nextUrl); 
    $y = $x; 
    print_r("Printing X:"); 
    print_r($x); 
    print_r("Printing Y:"); 
    print_r($y); 

    nextPage($nextUrl, $y); 
} 
-2

只是刪除isset()函數

 
    if($next){ 
    }