2013-05-16 35 views
0

(順便說一下,我正在徵求有關網站的許可,以此刮取這些東西)。PHP刮刀似乎處於無限循環

很簡單的網絡刮板,工作正常,當我手動加載所有的鏈接,但是當我試圖通過JSON和變量加載它們(所以我可以做很多與一個腳本的刮,並使通過向JSON添加更多鏈接,該過程更加模塊化),它運行在無限循環上。

(頁面已加載大約15分鐘現在)

這是我的JSON。只有一家商店出於測試目的,但將會有大約15家店。

[ 
    { 
     "store":"Incu Men", 
     "cat":"Accessories", 
     "general_cat":"Accessories", 
     "spec_cat":"accessories", 
     "url":"http://www.incuclothing.com/shop-men/accessories/", 
     "baseurl":"http://www.incuclothing.com", 
     "next_select":"a.next", 
     "prod_name_select":".infobox .fn", 
     "label_name_select":".infobox .brand", 
     "desc_select":".infobox .description", 
     "price_select":"#price", 
     "mainImg_select":"", 
     "more_imgs":".product-images", 
     "product_url":".hproduct .photo-link" 
    } 
] 

這裏是PHP代碼刮板:

<?php 
//Set infinite time limit 
set_time_limit (0); 
// Include simple html dom 
include('simple_html_dom.php'); 
// Defining the basic cURL function 
function curl($url) { 
    $ch = curl_init(); 
    // Initialising cURL 
    curl_setopt($ch, CURLOPT_URL, $url); 
    // Setting cURL's URL option with the $url variable passed into the function 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    // Setting cURL's option to return the webpage data 
    $data = curl_exec($ch); 
    // Executing the cURL request and assigning the returned data to the $data variable 
    curl_close($ch); 
    // Closing cURL 
    return $data; 
    // Returning the data from the function 
} 

function getLinks($catURL, $prodURL, $baseURL, $next_select) { 
    $urls = array(); 

    while($catURL) { 
     echo "Indexing: $url" . PHP_EOL; 
     $html = str_get_html(curl($catURL)); 

     foreach ($html->find($prodURL) as $el) { 
      $urls[] = $baseURL . $el->href; 
     } 

     $next = $html->find($next_select, 0); 
     $url = $next ? $baseURL . $next->href : null; 

     echo "Results: $next" . PHP_EOL; 
    } 

    return $urls; 
} 

$string  = file_get_contents("jsonWorkers/incuMens.json"); 
$json_array = json_decode($string,true); 

foreach ($json_array as $value){ 

    $baseURL = $value['baseurl']; 
    $catURL = $value['url']; 
    $store = $value['store']; 
    $general_cat = $value['general_cat']; 
    $spec_cat = $value['spec_cat']; 
    $next_select = $value['next_select']; 
    $prod_name = $value['prod_name_select']; 
    $label_name = $value['label_name_select']; 
    $description = $value['desc_select']; 
    $price = $value['price_select']; 
    $prodURL = $value['product_url']; 

    if (!is_null($value['mainImg_select'])){ 
     $mainImg = $value['mainImg_select']; 
    } 
    $more_imgs = $value['more_imgs']; 



    $allLinks = getLinks($catURL, $prodURL, $baseURL, $next_select); 

} 

?> 

任何想法,爲什麼這個腳本會被無限,而不是運行返回任何東西/停止/打印什麼屏?我只是讓它運行,直到停止。當我手工操作時,只需要一分鐘左右,有時候會少一些,所以我確定這是我的變量/ json的問題,但我不能在我的生活中看到問題所在。

任何人都可以快速瀏覽並指向正確的方向嗎?

回答

3

您的while($catURL)循環出現問題。您想做什麼 ? 此外,您可以強制使用flush()命令在瀏覽器上顯示信息。

+0

+1爲標註刷新 – Orangepill

+0

啊!我改變了一個變量的名字($ catURL是$ url),並且意外地沒有改變它。乾杯兄弟!我會查找'flush()',這是PHP新手,所以可能是我錯過了一些簡單的東西。 – Jascination