0
我有一個函數,它將一個url數組作爲輸入。我已經驗證了網址是正確的,我可以完美地循環瀏覽它們。我還使用curl_getinfo驗證了curl正在下載正確的頁面。但是,curl(html)的輸出對於每個頁面都是相同的。這裏是我的代碼:PHP Curl下載問題
$urls = array();
$urls = getpages($mainpage);
print_r($urls);
foreach($urls as $link) {
echo $link. '<br><br><br>';
$circdl = my_curl($link);
echo $circdl. '<br><br><br>';
$circdl = NULL;
}
的URL的輸出數組如下:
Array ([0] => http://www.site.com/savings/viewcircular?promotionId=81498&sneakpeek=¤tPageNumber=1 [1] => http://www.site.com/savings/viewcircular?promotionId=81498&sneakpeek=¤tPageNumber=2
$鏈接也輸出適當一樣在curl_getinfo捲曲。我已經運行了另一個url數組通過這個循環,他們工作正常,但我懷疑這裏的問題是與網址(&符號)的格式。我真的難住爲什麼這些網頁沒有按預期下載。
這裏的my_curl功能:
function my_curl($url)
{
$timeout=10;
$error_report=TRUE;
$curl = curl_init();
$cookiepath = drupal_get_path('module','mymodule'). '/cookies.txt';
// HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK
// SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
curl_setopt($curl, CURLOPT_URL, $url );
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6' );
curl_setopt($curl, CURLOPT_HTTPHEADER, $header );
curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com' );
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate' );
curl_setopt($curl, CURLOPT_AUTOREFERER, TRUE );
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE );
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiepath);
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookiepath);
curl_setopt($curl, CURLOPT_TIMEOUT, $timeout );
// RUN THE CURL REQUEST AND GET THE RESULTS
$htm = curl_exec($curl);
// Check for page request
//$info = curl_getinfo($curl);
//echo 'Took ' . $info['total_time'] . ' seconds to send a request to ' . $info['url'];
// ON FAILURE HANDLE ERROR MESSAGE
if ($htm === FALSE)
{
if ($error_report)
{
$err = curl_errno($curl);
$inf = curl_getinfo($curl);
echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
var_dump($inf);
}
curl_close($curl);
return FALSE;
}
// ON SUCCESS RETURN XML/HTML STRING
curl_close($curl);
return $htm;
}
什麼是非常有趣的是,如果我運行此:
echo my_curl('http://www.site.com/savings/viewcircular?promotionId=81498&sneakpeek=¤tPageNumber=2')
輸出是正確的!! ?? :(
感謝您的幫助!
你可以發佈'my_curl()'方法的代碼,因爲它看起來像是保存相關代碼的函數嗎? – newfurniturey
我剛剛創建了一個包含這兩個頁面的數組,並且在循環中運行它,結果很好。我能看到的唯一區別是$ link變量顯示了這個:http://www.site.com/savings/viewcircular?promotionId=81498&sneakpeek=¤tPageNumber=1而不是這個http://www.site.com /儲蓄/ viewcircular?promotionId = 81498&sneakpeek =&currentPageNumber = 1。我絕對認爲這是一個編碼問題。 –