2016-01-24 35 views
0

我有一個php文件,它從另一個站點抓取一個xml文件,然後將該信息卡入到我的數據庫中。加載外部XML文件並在1個電話中獲取html頭信息

我遇到的問題是,他們的網站只允許在任何1小時的時間內發出360個請求,所以我試圖在抓取文件時檢查標題信息。

我有它使用

$requesttest = 'http://www.footballwebpages.co.uk/teams.xml'; 
if($requesttest == NULL) return false; 
$ch = curl_init($requesttest); 
curl_setopt($ch, CURLOPT_TIMEOUT, 5); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$data = curl_exec($ch); 
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE); 
curl_close($ch); 

if($httpcode == 429){ 
    return 'Try again later, too many requests recieved.'; 
} else if($httpcode>=200 && $httpcode<300){ 
    /* run code to grab xml file */ 
    $comps = array ( 0 => 1, /* Premier_League */ 
        1 => 2 /* Championship */ 
        ); 
    $comps_total = count($comps); 
    $comps_no = 0; 

    while ($comps_no < $comps_total) { 
     $url = 'http://www.footballwebpages.co.uk/teams.xml?comp=' . $comps[$comps_no]; 
     $full_list = simplexml_load_file($url); 
     /* Code for grabbing and storing info from XML */ 
} else { 
    return 'Football Web Pages Offline'; 
} 

目前檢查的頁面的狀態,它會檢查主「團隊」頁面,看看是否請求已經達到極限,然後獲取每個XML爲比賽設置。問題是,如果首次檢查時,只有一個請求可用,當它進入下一階段時,它將失敗。如何在加載xml文件時檢查標題信息,而不必調用頁面來檢查標題,然後調用頁面來獲取xml文件?

在1次調用中,如果頭代碼在200到300之間,基本上會加載xml文件,以免浪費2次請求來抓取1個xml頁面。

+0

'while($ comps_no <$ comps_total){'〜循環中沒有遞增器 - 它將繼續前進和結束......並且您不關閉循環 – RamRaider

+0

是的,我將代碼截取爲相當長:)完整代碼中有一個增量器 – Dean84

回答

0

也許你可以使用類似如下的方法,忘了基本URL第一個呼叫,因爲它是多餘的,而不是使用返回值從函數,以確定是否進一步處理要做到:

<?php 
    /* utility function to get data and return an object */ 
    function getxml($comp=1){ 
     global $ch; 
     global $url; 

     curl_setopt($ch, CURLOPT_URL, $url . '?comp=' . $comp); 
     $data = curl_exec($ch); 
     $status = curl_getinfo($ch, CURLINFO_HTTP_CODE); 

     return (object)array(
      'xmldata' => $data, 
      'status' => $status 
     ); 
    } 
    /* All the comps available - more than specified! */ 
    $comps=array( 
     'Barclays_Premier_League' => 1, 
     'Sky_Bet_Championship' => 2, 
     'Sky_Bet_League_One' => 3, 
     'Sky_Bet_League_Two' => 4, 
     'National_League' => 5, 
     'National_League_North' => 6, 
     'National_League_South' => 7, 
     'Evo-Stik_Southern_League_Premier_Division' => 8, 
     'Evo-Stik_Southern_League_Division_One_Central' => 9, 
     'Evo-Stik_Southern_League_Division_One_South_&_West' => 10, 
     'Ryman_League_Premier_Division' => 11, 
     'Ryman_League_Division_One_North' => 12, 
     'Ryman_League_Division_One_South' => 13, 
     'Evo-Stik_League_Premier_Division' => 14, 
     'Evo-Stik_League_Division_One_North' => 15, 
     'Evo-Stik_League_Division_One_South' => 16, 
     'Scottish_Premiership' => 17, 
     'Scottish_Championship' => 18, 
     'Scottish_League_One' => 19, 
     'Scottish_League_Two' => 20 
    ); 
    /* only interested in first two */ 
    $comps=array_slice($comps, 0, 2, true); 


    /* I don't use simple_xml() - used to process xml data */ 
    $dom=new DOMDocument; 

    /* base url */ 
    $url= 'http://www.footballwebpages.co.uk/teams.xml'; 

    /* 
     initialise curl request object but 
     set the url for each $comp in the function 
    */ 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 5); 
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 

    /* 
    If there have been too many requests when launching 
    the 429 condition should break out of the entire loop - 
    thus using only 1 request 
    */ 
    foreach($comps as $key => $comp){ 
     $xml=getxml($comp); 
     switch($xml->status){ 
      case 429: echo 'Try again later, too many requests recieved.'; break 2; 
      case 200: 
       /* if everything is ok, process $xml */ 
       $dom->loadXML($xml->xmldata); 


       /* example of processing xml data */ 
       echo ' 
       <h1>'.$dom->getElementsByTagName('competition')->item(0)->nodeValue.'</h1> 
        <ul>'; 

       $col=$dom->getElementsByTagName('team'); 
       if($col){ 
        foreach($col as $team) echo '<li>'.$team->childNodes->item(1)->nodeValue.', '.$team->childNodes->item(3)->nodeValue.'</li>'; 
       } 
       echo ' 
        </ul>'; 
      break; 
      default:/* If no response or an unknown response exit */ 
       echo 'Football Web Pages Offline'; 
      break 2; 
     } 
    } 

    curl_close($ch); 
    $dom=$ch=$comps=null; 
?>