0
(順便說一下,我正在徵求有關網站的許可,以此刮取這些東西)。PHP刮刀似乎處於無限循環
很簡單的網絡刮板,工作正常,當我手動加載所有的鏈接,但是當我試圖通過JSON和變量加載它們(所以我可以做很多與一個腳本的刮,並使通過向JSON添加更多鏈接,該過程更加模塊化),它運行在無限循環上。
(頁面已加載大約15分鐘現在)
這是我的JSON。只有一家商店出於測試目的,但將會有大約15家店。
[
{
"store":"Incu Men",
"cat":"Accessories",
"general_cat":"Accessories",
"spec_cat":"accessories",
"url":"http://www.incuclothing.com/shop-men/accessories/",
"baseurl":"http://www.incuclothing.com",
"next_select":"a.next",
"prod_name_select":".infobox .fn",
"label_name_select":".infobox .brand",
"desc_select":".infobox .description",
"price_select":"#price",
"mainImg_select":"",
"more_imgs":".product-images",
"product_url":".hproduct .photo-link"
}
]
這裏是PHP代碼刮板:
<?php
//Set infinite time limit
set_time_limit (0);
// Include simple html dom
include('simple_html_dom.php');
// Defining the basic cURL function
function curl($url) {
$ch = curl_init();
// Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url);
// Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Setting cURL's option to return the webpage data
$data = curl_exec($ch);
// Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch);
// Closing cURL
return $data;
// Returning the data from the function
}
function getLinks($catURL, $prodURL, $baseURL, $next_select) {
$urls = array();
while($catURL) {
echo "Indexing: $url" . PHP_EOL;
$html = str_get_html(curl($catURL));
foreach ($html->find($prodURL) as $el) {
$urls[] = $baseURL . $el->href;
}
$next = $html->find($next_select, 0);
$url = $next ? $baseURL . $next->href : null;
echo "Results: $next" . PHP_EOL;
}
return $urls;
}
$string = file_get_contents("jsonWorkers/incuMens.json");
$json_array = json_decode($string,true);
foreach ($json_array as $value){
$baseURL = $value['baseurl'];
$catURL = $value['url'];
$store = $value['store'];
$general_cat = $value['general_cat'];
$spec_cat = $value['spec_cat'];
$next_select = $value['next_select'];
$prod_name = $value['prod_name_select'];
$label_name = $value['label_name_select'];
$description = $value['desc_select'];
$price = $value['price_select'];
$prodURL = $value['product_url'];
if (!is_null($value['mainImg_select'])){
$mainImg = $value['mainImg_select'];
}
$more_imgs = $value['more_imgs'];
$allLinks = getLinks($catURL, $prodURL, $baseURL, $next_select);
}
?>
任何想法,爲什麼這個腳本會被無限,而不是運行返回任何東西/停止/打印什麼屏?我只是讓它運行,直到停止。當我手工操作時,只需要一分鐘左右,有時候會少一些,所以我確定這是我的變量/ json的問題,但我不能在我的生活中看到問題所在。
任何人都可以快速瀏覽並指向正確的方向嗎?
+1爲標註刷新 – Orangepill
啊!我改變了一個變量的名字($ catURL是$ url),並且意外地沒有改變它。乾杯兄弟!我會查找'flush()',這是PHP新手,所以可能是我錯過了一些簡單的東西。 – Jascination