2013-07-17 18 views
0

我正在使用以下代碼來檢查給定網址中的鏈接中斷。但是這個過程非常緩慢。我需要儘快加快這一進程。curl_multi非常緩慢地檢查100個網址

$$url_list = array(
"http://goog528le.com", 
"http://facebook.com", 
"http://google.com", 
"http://youtube.com", 
"http://yahoo.com", 
"http://amazon.com", 
"http://baidu.com", 
"http://wikipedia.org", 
"http://live.com", 
"http://qq.com", 
"http://taobao.com", 
"http://google.co.in", 
"http://twitter.com", 
"http://blogspot.com", 
"http://yahoo.co.jp", 
"http://linkedin.com", 
"http://bing.com", 
"http://sina.com.cn" 
, "http://yandex.ru"); 

// 1. multi handle 
$mh = curl_multi_init(); 
$max_connections = 10; 
$dead_urls = array(); 
$not_found_urls = array(); 

// 2. add multiple URLs to the multi handle 
for ($i = 0; $i < $max_connections; $i++) { 
    add_url_to_multi_handle($mh, $url_list); 
} 

// 3. initial execution 
do { 
$mrc = curl_multi_exec($mh, $active); 

} while($ mrc == CURLM_CALL_MULTI_PERFORM);

// 4.主循環 而($活性& & $ MRC == CURLM_OK){

// 5. there is activity 
if (curl_multi_select($mh) != -1) { 

    // 6. do work 
    do { 
     $mrc = curl_multi_exec($mh, $active); 
    } while ($mrc == CURLM_CALL_MULTI_PERFORM); 

    // 7. is there info? 
    if ($mhinfo = curl_multi_info_read($mh)) { 
     // this means one of the requests were finished 
     // 8. get the info on the curl handle 
     $chinfo = curl_getinfo($mhinfo['handle']); 

     // 9. dead link? 
     if (!$chinfo['http_code']) { 
      $dead_urls [] = $chinfo['url']; 

      // 10. 404? 
     } else if ($chinfo['http_code'] == 404) { 
      $not_found_urls [] = $chinfo['url']; 

      // 11. working 
     } else { 
      $working_urls [] = $chinfo['url']; 
     } 

     // 12. remove the handle 
     curl_multi_remove_handle($mh, $mhinfo['handle']); 
     curl_close($mhinfo['handle']); 

     // 13. add a new url and do work 
     if (add_url_to_multi_handle($mh, $url_list)) { 

      do { 
       $mrc = curl_multi_exec($mh, $active); 
      } while ($mrc == CURLM_CALL_MULTI_PERFORM); 
     } 
    } 
} 

}

// 14完成 curl_multi_close($ MH +);

echo "==Dead URLs==\n"; 
echo implode("\n", $dead_urls) . "\n\n"; 

echo "==404 URLs==\n"; 
echo implode("\n", $not_found_urls) . "\n\n"; 

echo "==Working URLs==\n"; 
echo implode("\n", $working_urls); 

// 15. adds a url to the multi handle 
function add_url_to_multi_handle($mh, $url_list) 
{ 
static $index = 0; 

// if we have another url to get 
if (isset($url_list[$index]) && $url_list[$index]) { 

    // new curl handle 
    $ch = curl_init(); 

    // set the url 
    curl_setopt($ch, CURLOPT_URL, $url_list[$index]); 
    // to prevent the response from being outputted 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    // follow redirections 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    // do not need the body. this saves bandwidth and time 
    curl_setopt($ch, CURLOPT_NOBODY, 1); 

    // add it to the multi handle 
    curl_multi_add_handle($mh, $ch); 
    // increment so next url is used next time 
    $index++; 

    return true; 
} else { 

    // we are done adding new URLs 
    return false; 
} 
} 
+0

該代碼不會執行任何操作 – DevZer0

+1

您查看的速度有多快? [一旦在15秒內加載了100個域名,其全部內容](http://stackoverflow.com/a/13461652/1226894) – Baba

+0

巴巴先生感謝您回覆我。但仍不能解決我的問題。在某些時候使用這種方式也遲遲得不到結果。 – Dumidu

回答

3

解決方案是在每個請求完成後立即處理。這消除了繁忙等待中浪費的CPU週期。創建一個cURL請求隊列以實現最大吞吐量也是一個好主意。每次請求完成時,我都會從隊列中添加一個新的請求。通過動態添加和刪除鏈接,我們始終保持不斷數量的鏈接下載。這給了我們一種遏制我們發送的同時請求數量的方法。結果是並行處理大量cURL請求的速度更快,效率更高。

來源:onlineaspect.com

這裏有一個函數供參考:

function rolling_curl($urls, $callback, $custom_options = null) { 

    // make sure the rolling window isn't greater than the # of urls 
    $rolling_window = 5; 
    $rolling_window = (sizeof($urls) &lt; $rolling_window) ? sizeof($urls) : $rolling_window; 

    $master = curl_multi_init(); 
    $curl_arr = array(); 

    // add additional curl options here 
    $std_options = array(CURLOPT_RETURNTRANSFER =&gt; true, 
    CURLOPT_FOLLOWLOCATION =&gt; true, 
    CURLOPT_MAXREDIRS =&gt; 5); 
    $options = ($custom_options) ? ($std_options + $custom_options) : $std_options; 

    // start the first batch of requests 
    for ($i = 0; $i &lt; $rolling_window; $i++) { 
     $ch = curl_init(); 
     $options[CURLOPT_URL] = $urls[$i]; 
     curl_setopt_array($ch,$options); 
     curl_multi_add_handle($master, $ch); 
    } 

    do { 
     while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM); 
     if($execrun != CURLM_OK) 
      break; 
     // a request was just completed -- find out which one 
     while($done = curl_multi_info_read($master)) { 
      $info = curl_getinfo($done['handle']); 
      if ($info['http_code'] == 200) { 
       $output = curl_multi_getcontent($done['handle']); 

       // request successful. process output using the callback function. 
       $callback($output); 

       // start a new request (it's important to do this before removing the old one) 
       $ch = curl_init(); 
       $options[CURLOPT_URL] = $urls[$i++]; // increment i 
       curl_setopt_array($ch,$options); 
       curl_multi_add_handle($master, $ch); 

       // remove the curl handle that just completed 
       curl_multi_remove_handle($master, $done['handle']); 
      } else { 
       // request failed. add error handling. 
      } 
     } 
    } while ($running); 

    curl_multi_close($master); 
    return true; 
} 

希望這有助於!

+0

我需要所有的網址不僅響應200.所以我編輯瞭如下的代碼,但它也得到更多的時間給結果 – Dumidu

+0

這也將得到更多的時間給結果。你有其他的事情嗎? – Dumidu