2011-03-29 126 views
0

我有以下代碼從textarea字段接收URL和代理的輸入,使用curl獲取源代碼,從頁面獲取特定鏈接並將它們插入到數據庫中。這適用於一個網址,但在爲多個網址/代理添加代理和兩個循環後無效。現在它只是超時沒有錯誤信息,並說它找不到該文件。我正在從proxy-list.org獲取代理。任何指針將不勝感激。爲什麼我的php curl腳本超時?

<html> 
<body> 

<? 
$urls=explode("\n", $_POST['url']); 
$proxies=explode("\n", $_POST['proxy']); 

$allurls=count($urls); 
$allproxies=count($proxies); 

for ($counter = 0; $counter <= $allurls; $counter++) { 
for ($count = 0; $count <= $allproxies; $count++) { 

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$urls[$counter]); 
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0); 
curl_setopt($ch, CURLOPT_PROXY,$proxies[$count]); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET'); 
curl_setopt ($ch, CURLOPT_HEADER, 1); 
curl_exec ($ch); 
$curl_scraped_page=curl_exec($ch); 

//use the new tool box 
require "ToolBoxA4.php"; 

//call the new function parseA1 
$arrOut = parseA1 ($curl_scraped_page); 

//the output is an array with 3 items: $arrOut[0] is RHS, $arrOut[1] is TOP, $arrOut[2] is NAT 
//to look at the RHS 

//$arrLookAt = explode(",", $arrOut[0]); 
//print_r ($arrLookAt); 
//echo "<br><hr><br>"; 
//foreach ($arrLookAt as $value){ 
//  echo $value; 
//  echo "<br>"; 
//} 

$FileName = abs(rand(0,1000000000000)); 
$FileHandle = fopen($FileName, 'w') or die("can't open file"); 
fwrite($FileHandle, $curl_scraped_page); 

//$dom = new DOMDocument(); 
//@$dom->loadHTML($curl_scraped_page); 
//$xpath = new DOMXPath($doc); 
//$hrefs = $xpath->query('//a[@href][@id]'); 

$hostname="****"; 
$username="****"; 
$password="****"; 
$dbname="****"; 
$usertable="****"; 

$con=mysql_connect($hostname,$username, $password) or die ("<html><script language='JavaScript'>alert('Unable to connect to database! Please try again later.'),history.go(-1)</script></html>"); 
mysql_select_db($dbname ,$con); 

//function storeLink($url) { 
// $query = "INSERT INTO **** (time, ad1, ad2) VALUES ('$FileName','$url', '$gathered_from')"; 
// mysql_query($query) or die('Error, insert query failed'); 
//} 
//for ($i = 0; $i < $hrefs->length; $i++) { 
// $href = $hrefs->item($i); 
// $url = $href->getAttribute('href'); 
// storeLink($url); 
// 
//} 

//function storeLink($top, $right) { 
//$query = "INSERT INTO happyturtle (time, ad1, ad2) VALUES ('$FileName','$top', '$right')"; 
//mysql_query($query) or die('Error, insert query failed'); 

$right = explode(",", $arrOut[0]); 
$top = explode(",", $arrOut[1]); 

for ($countforme = 0; $countforme <= 5; $countforme++) { 

$topnow=$top[$countforme]; 

$query = "INSERT INTO **** (time, ad1) VALUES ('$FileName','$topnow')"; 
mysql_query($query) or die('Error, insert query failed'); 

} 

for ($countforme = 0; $countforme <= 15; $countforme++) { 

$rightnow = $right[$countforme]; 


$query = "INSERT INTO **** (time, ad1) VALUES ('$FileName','$rightnow')"; 
mysql_query($query) or die('Error, insert query failed'); 

} 


mysql_close($con); 

fclose($FileHandle); 

curl_close($ch); 

//echo $FileName; 

//echo "<br/>"; 

} 
} 

?> 

</body> 
</html> 

回答

0

您的代碼將依次獲取每個URL,因此可能需要很長時間才能運行。一種可能的解決方案是使用允許多個請求同時運行的cURL「多」接口 - http://www.php.net/manual/en/function.curl-multi-exec.php

另一種替代方法是在您使用的服務器上增加PHP超時,如果這基本上是一個批處理過程。這方面的信息是在http://php.net/manual/en/function.set-time-limit.php

我會做的一個觀察是,公共代理(如來自proxy-list.org的代理)可能會非常緩慢的作出迴應,並且由於您從多個位置請求您的腳本將始終作爲只要最慢的代理服務器響應(可能比您的服務器的PHP超時設置更長)。