0
好吧,我試圖用curl打開頁面http://gratka.pl以獲取它的內容,不幸的是它們似乎對它有很好的保護。我的代碼(在Zend中FW):在CURL中打開的頁面會給出與瀏覽器不同的結果
$client = new \Zend\Http\Client;
$client->setHeaders($options);
$adapter = new \Zend\Http\Client\Adapter\Curl();
$agent = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; chromeframe/13.0.782.218; chromeframe; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)';
//$agent = 'Googlebot/2.1 (+http://www.googlebot.com/bot.html)';
$header=array(
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Connection: keep-alive',
);
$clientOptions = array(
'curloptions' => array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
//CURLOPT_ENCODING => "gzip",
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_USERAGENT => $agent,
CURLOPT_VERBOSE => true,
CURLOPT_AUTOREFERER => true,
CURLOPT_COOKIEJAR => 'cookie.txt',
CURLOPT_COOKIEFILE => 'cookie.txt',
CURLOPT_HTTPHEADER => $header,
CURLOPT_REFERER => "http://google.com",
//CURLOPT_COOKIE => 'sesja_gratka=065ad930ce08fa203b39e2599f19e345; __gfp_64b=8IF3rdQKeCJiUBB.P4vNx3KWyCYii.16iOnjxq.C6tz.77; PHPSESSID=26f34d5c637c9db9c752695b2a2db427; __utmc=239465948',
),
);
$client->setOptions($clientOptions);
$client->setAdapter($adapter);
$client->setUri($url);
$result = $client->send();
$cookies = $client->getCookies();
$header = $result->getHeaders();
$body = $result->getBody();
var_dump($body);die;
var_dump(htmlspecialchars($header));die;
我嘗試許多選項和東西,但仍然是相同的 - 沒有得到餅乾,並沒有得到身體......除了有效的網站內容我在所有的時間頁面上顯示 - '你被暫時封鎖了',而在瀏覽器中打開同一個網站時一切正常
你正在'HTTPHEADER'和'USERAGENT'選項中設置不同的'User-Agent' - 哪一個獲勝? – MrWhite
@ w3d好了,兩者 - 一個顯示爲[「HTTP_USERAGENT」],第二個顯示爲[「HTTP_USER_AGENT」]。這是我的curl請求的$ _SERVER的輸出:http://bwdesign.sldc.pl/zend/public/import – b4rt3kk
那麼,只能有一個User-Agent HTTP請求頭(或者至少只有一個被服務器讀取)。我唯一的想法是,如果服務器看到「Firefox/3.6」,它可能會認爲這個「太老,不能成爲真正的用戶」並阻止它? – MrWhite