2012-09-26 123 views
1

我正在使用php腳本,以便使用curl從外部URL下載xml文件,但遇到問題。 Curl有時無法下載完整的文件。當我通過使用cron的主機服務器運行腳本時,問題更常發生。curl(php腳本)下載不完整的文件

這是腳本:

<?php 
header('Content-type:text/html; charset=utf-8'); 

//initialize downloading xml file tries 
$xml_dl_attempts = 0; 

//set filename of output xml file 
$findex = 0; 
while(file_exists("xml".$findex.".xml")) 
{ 
    $findex++; 
} 
$filename = "xml".$findex.".xml"; 

//filname for log file 
$logfilename = "log.txt"; 

//Open (append) logfile for write. 
$logfileout = fopen($logfilename, 'a'); 
fwrite($logfileout, "Starting attempts to download the xml file at ".date("H:i:s Y-m-d")."\r\n"); 

//Attempt to download xml file 8 times 
do { 
    //Sleep 3 second before retrying download 
    if($xml_dl_attempts > 0) sleep(3); 

    //Increse number of download attempts 
    $xml_dl_attempts++; 
    //Write to logfile 
    fwrite($logfileout, date("H:i:s Y-m-d").": Download attempt number ".$xml_dl_attempts.": "); 

    //Download xml file using curl 
    $ch = curl_init(); 
    $url = 'http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR'; 

    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_HEADER, false); 
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 

    set_time_limit(300); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); 

    $outfile = fopen($filename, 'w'); 
    if (!$outfile) 
    { 
    exit; 
    } 
    curl_setopt($ch, CURLOPT_FILE, $outfile); 

    if(curl_exec($ch)==false) 
    { 
     fwrite($logfileout, "curl_error: ".curl_error($ch)); 
    } 
    fclose($outfile); 
    curl_close($ch); 

    //Clear errors 
    libxml_use_internal_errors(true); 
    libxml_clear_errors(); 

    //Parse xml file 
    $xml = simplexml_load_file($filename); 

    //Check for errors 
    if($err = libxml_get_last_error()) 
    { 
     fwrite($logfileout, "failed\r\n"); 
    } 
} while($err !== false && $xml_dl_attempts < 8); //repeat if xml was not completely downloaded 

//Check if 
if(!$err) 
{ 
    fwrite($logfileout, "successfull\r\n"); 
} 
fwrite($logfileout, "End.\r\n"); 
fclose($logfileout); 
?> 

正如你可以看到我是否simplexml的解析器在解析下載的XML文件給出了一個錯誤。如果發生錯誤,我會重複這個過程,限制8次嘗試。我還創建了一個日誌文件。

這裏是一整天的日誌文件:

Starting attempts to download the xml file at 18:35:00 2012-09-25 

18:35:00 2012-09-25: Download attempt number : failed 

18:35:03 2012-09-25: Download attempt number : failed 

18:35:07 2012-09-25: Download attempt number : successfull 

End. 

Starting attempts to download the xml file at 19:35:00 2012-09-25 

19:35:00 2012-09-25: Download attempt number 1: failed 

19:35:03 2012-09-25: Download attempt number 2: failed 

19:35:06 2012-09-25: Download attempt number 3: failed 

19:35:10 2012-09-25: Download attempt number 4: failed 

19:35:13 2012-09-25: Download attempt number 5: failed 

19:35:16 2012-09-25: Download attempt number 6: failed 

19:35:20 2012-09-25: Download attempt number 7: failed 

19:35:23 2012-09-25: Download attempt number 8: successfull 

End. 

Starting attempts to download the xml file at 20:35:00 2012-09-25 

20:35:00 2012-09-25: Download attempt number 1: failed 

20:35:04 2012-09-25: Download attempt number 2: failed 

20:35:08 2012-09-25: Download attempt number 3: successfull 

End. 

Starting attempts to download the xml file at 21:35:00 2012-09-25 

21:35:00 2012-09-25: Download attempt number 1: failed 

21:35:04 2012-09-25: Download attempt number 2: failed 

21:35:07 2012-09-25: Download attempt number 3: failed 

21:35:11 2012-09-25: Download attempt number 4: successfull 

End. 

Starting attempts to download the xml file at 22:35:00 2012-09-25 

22:35:00 2012-09-25: Download attempt number 1: failed 

22:35:04 2012-09-25: Download attempt number 2: failed 

22:35:07 2012-09-25: Download attempt number 3: successfull 

End. 

Starting attempts to download the xml file at 23:35:00 2012-09-25 

23:35:00 2012-09-25: Download attempt number 1: failed 

23:35:03 2012-09-25: Download attempt number 2: failed 

23:35:07 2012-09-25: Download attempt number 3: failed 

23:35:10 2012-09-25: Download attempt number 4: failed 

23:35:14 2012-09-25: Download attempt number 5: failed 

23:35:17 2012-09-25: Download attempt number 6: failed 

23:35:21 2012-09-25: Download attempt number 7: successfull 

End. 

Starting attempts to download the xml file at 00:35:00 2012-09-26 

00:35:00 2012-09-26: Download attempt number 1: successfull 

End. 

Starting attempts to download the xml file at 01:35:00 2012-09-26 

01:35:00 2012-09-26: Download attempt number 1: failed 

01:35:04 2012-09-26: Download attempt number 2: failed 

01:35:07 2012-09-26: Download attempt number 3: failed 

01:35:11 2012-09-26: Download attempt number 4: failed 

01:35:14 2012-09-26: Download attempt number 5: failed 

01:35:18 2012-09-26: Download attempt number 6: failed 

01:35:21 2012-09-26: Download attempt number 7: failed 

01:35:30 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 02:35:00 2012-09-26 

02:35:00 2012-09-26: Download attempt number 1: failed 

02:35:03 2012-09-26: Download attempt number 2: failed 

02:35:07 2012-09-26: Download attempt number 3: failed 

02:35:10 2012-09-26: Download attempt number 4: failed 

02:35:13 2012-09-26: Download attempt number 5: failed 

02:35:17 2012-09-26: Download attempt number 6: failed 

02:35:20 2012-09-26: Download attempt number 7: failed 

02:35:24 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 03:35:00 2012-09-26 

03:35:00 2012-09-26: Download attempt number 1: failed 

03:35:04 2012-09-26: Download attempt number 2: failed 

03:35:07 2012-09-26: Download attempt number 3: failed 

03:35:10 2012-09-26: Download attempt number 4: failed 

03:35:14 2012-09-26: Download attempt number 5: failed 

03:35:17 2012-09-26: Download attempt number 6: failed 

03:35:21 2012-09-26: Download attempt number 7: failed 

03:35:30 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 04:35:00 2012-09-26 

04:35:00 2012-09-26: Download attempt number 1: failed 

04:35:03 2012-09-26: Download attempt number 2: failed 

04:35:07 2012-09-26: Download attempt number 3: failed 

04:35:10 2012-09-26: Download attempt number 4: failed 

04:35:14 2012-09-26: Download attempt number 5: failed 

04:35:17 2012-09-26: Download attempt number 6: failed 

04:35:21 2012-09-26: Download attempt number 7: failed 

04:35:24 2012-09-26: Download attempt number 8: successfull 

End. 

Starting attempts to download the xml file at 05:35:00 2012-09-26 

05:35:00 2012-09-26: Download attempt number 1: failed 

05:35:04 2012-09-26: Download attempt number 2: failed 

05:35:08 2012-09-26: Download attempt number 3: failed 

05:35:11 2012-09-26: Download attempt number 4: failed 

05:35:15 2012-09-26: Download attempt number 5: failed 

05:35:18 2012-09-26: Download attempt number 6: failed 

05:35:22 2012-09-26: Download attempt number 7: failed 

05:35:25 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 06:35:00 2012-09-26 

06:35:00 2012-09-26: Download attempt number 1: failed 

06:35:03 2012-09-26: Download attempt number 2: failed 

06:35:07 2012-09-26: Download attempt number 3: failed 

06:35:10 2012-09-26: Download attempt number 4: failed 

06:35:14 2012-09-26: Download attempt number 5: failed 

06:35:17 2012-09-26: Download attempt number 6: failed 

06:35:21 2012-09-26: Download attempt number 7: failed 

06:35:24 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 07:35:00 2012-09-26 

07:35:00 2012-09-26: Download attempt number 1: failed 

07:35:04 2012-09-26: Download attempt number 2: failed 

07:35:07 2012-09-26: Download attempt number 3: failed 

07:35:11 2012-09-26: Download attempt number 4: failed 

07:35:14 2012-09-26: Download attempt number 5: failed 

07:35:18 2012-09-26: Download attempt number 6: failed 

07:35:21 2012-09-26: Download attempt number 7: failed 

07:35:24 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 08:35:00 2012-09-26 

08:35:00 2012-09-26: Download attempt number 1: failed 

08:35:03 2012-09-26: Download attempt number 2: failed 

08:35:06 2012-09-26: Download attempt number 3: failed 

08:35:10 2012-09-26: Download attempt number 4: failed 

08:35:13 2012-09-26: Download attempt number 5: failed 

08:35:16 2012-09-26: Download attempt number 6: failed 

08:35:20 2012-09-26: Download attempt number 7: failed 

08:35:23 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 09:35:00 2012-09-26 

09:35:00 2012-09-26: Download attempt number 1: failed 

09:35:04 2012-09-26: Download attempt number 2: failed 

09:35:07 2012-09-26: Download attempt number 3: successfull 

End. 

Starting attempts to download the xml file at 10:35:00 2012-09-26 

10:35:00 2012-09-26: Download attempt number 1: failed 

10:35:03 2012-09-26: Download attempt number 2: failed 

10:35:06 2012-09-26: Download attempt number 3: failed 

10:35:10 2012-09-26: Download attempt number 4: failed 

10:35:13 2012-09-26: Download attempt number 5: failed 

10:35:17 2012-09-26: Download attempt number 6: failed 

10:35:20 2012-09-26: Download attempt number 7: successfull 

End. 

Starting attempts to download the xml file at 11:35:00 2012-09-26 

11:35:00 2012-09-26: Download attempt number 1: failed 

11:35:03 2012-09-26: Download attempt number 2: failed 

11:35:07 2012-09-26: Download attempt number 3: successfull 

End. 

Starting attempts to download the xml file at 12:35:00 2012-09-26 

12:35:00 2012-09-26: Download attempt number 1: failed 

12:35:04 2012-09-26: Download attempt number 2: failed 

12:35:07 2012-09-26: Download attempt number 3: failed 

12:35:11 2012-09-26: Download attempt number 4: failed 

12:35:14 2012-09-26: Download attempt number 5: failed 

12:35:17 2012-09-26: Download attempt number 6: failed 

12:35:21 2012-09-26: Download attempt number 7: successfull 

End. 

Starting attempts to download the xml file at 13:35:00 2012-09-26 

13:35:00 2012-09-26: Download attempt number 1: failed 

13:35:03 2012-09-26: Download attempt number 2: successfull 

End. 

Starting attempts to download the xml file at 14:35:00 2012-09-26 

14:35:00 2012-09-26: Download attempt number 1: failed 

14:35:03 2012-09-26: Download attempt number 2: failed 

14:35:07 2012-09-26: Download attempt number 3: failed 

14:35:10 2012-09-26: Download attempt number 4: successfull 

End. 

Starting attempts to download the xml file at 15:35:00 2012-09-26 

15:35:00 2012-09-26: Download attempt number 1: failed 

15:35:03 2012-09-26: Download attempt number 2: failed 

15:35:07 2012-09-26: Download attempt number 3: failed 

15:35:10 2012-09-26: Download attempt number 4: failed 

15:35:13 2012-09-26: Download attempt number 5: failed 

15:35:17 2012-09-26: Download attempt number 6: failed 

15:35:20 2012-09-26: Download attempt number 7: failed 

15:35:24 2012-09-26: Download attempt number 8: failed 

End. 

Starting attempts to download the xml file at 16:35:00 2012-09-26 

16:35:00 2012-09-26: Download attempt number 1: failed 

16:35:03 2012-09-26: Download attempt number 2: failed 

16:35:07 2012-09-26: Download attempt number 3: successfull 

End. 

的事情是,有時它設法得到了一些嘗試後,完整的文件,其他時間完全失敗。另一件需要注意的是,當xml不完整時,curl_exec不會返回錯誤。

不幸的是,具有xml的服務器不支持範圍,所以我不能在文件不完整時恢復文件。我可以增加嘗試的限制,比方說50,但事情是,如果嘗試失敗,腳本仍會下載一些數據,所以對於1MB的xml文件,如果每次下載500KB失敗30次,它會下載16 MB的數據成功嘗試。我想每小時運行一次這個腳本,所以我相信這會損害我的服務器的帶寬。

爲什麼curl無法下載完整的文件。是否有一些選項可以使它像瀏覽器一樣運行,最終始終獲取文件?

謝謝。

+1

你試過更長的超時了嗎? –

+0

@MattS也許這不是問題。檢查時間戳。 – Prasanth

+0

@goldenparrot哦,好點! –

回答

1

問題出在您的來源:服務器。

我試過scraperwiki運行的刮板,這裏是什麼它顯示:

1st screenshot

而且,同樣的問題發生時,我親自試過加載XML和它的工作對我來說是第三次。

您可以看到服務器正在關閉以下圖片的前兩個請求中的連接,而不是第三個(成功的)請求。

2nd screenshot

所以,問題是與服務器,你可以做任何關於它,如果它是不是你的。 (當然除了這個給他們服務器管理員通知!)

注:我相信scraperwiki具有非常好的互聯網連接,因爲它是依靠很多。所以,你可以安全地責怪它server fault #jboss

+0

我很害怕這是事實。不幸的是,我無能爲力。唯一讓我想到的是,當我從本地主機運行腳本時,它有更多的機會在第一次嘗試中成功獲取文件,但是當從主機服務器運行腳本時,它的確會變得更糟。有什麼可能導致我的主機服務器的這種差異? – Spahar

+0

可能是因特網連接。但是,我不知道。認真。 – Prasanth