1
我試圖實現一個像facebook一樣的功能,當您粘貼一個鏈接時,它從頁面中獲取一些信息(h1,desc,images,...)並顯示它們。PHP curl返回403但不是shell命令
我已經面對幾個問題,我設法解決(gzip,cookies,用戶代理,...)但在這一個我不知道是什麼阻止我的請求。
有問題的鏈接是http://www.mixcloud.com
這裏是我的PHP腳本:
protected function getContent()
{
$ch = curl_init();
$headers = array(
'Accept: */*',
// 'Accept-Encoding: gzip,deflate,sdch',
// 'Accept-Language: en-US,en;q=0.8,es;q=0.6,fr;q=0.4,pt;q=0.2',
// 'Cache-Control: no-cache',
// 'Connection: keep-alive'
);
$debug = TRUE;
// Set the request type
curl_setopt($ch, CURLOPT_VERBOSE, $debug);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_NOBODY, FALSE);
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_USERAGENT, $this->userAgent);
curl_setopt($ch, CURLOPT_REFERER, $this->referrer);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, $debug);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_ENCODING , 'gzip');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookies.txt');
$data = curl_exec($ch);
var_dump($data);die;
return curl_exec($ch);
}
下面是詳細的迴應:
* Adding handle: conn: 0x7f937504e400
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f937504e400) send_pipe: 1, recv_pipe: 0
* About to connect() to www.mixcloud.com port 80 (#0)
* Trying 46.23.65.210...
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0)
> GET/HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5
Host: www.mixcloud.com
Accept-Encoding: gzip
Referer: https://www.google.com.au
Accept: */*
< HTTP/1.1 403 Forbidden
* Server nginx/1.5.8 is not blacklisted
< Server: nginx/1.5.8
< Date: Tue, 18 Feb 2014 06:39:45 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Accept-Encoding
< Content-Encoding: gzip
<
* Connection #0 to host www.mixcloud.com left intact
string(376) "HTTP/1.1 403 Forbidden\r\nServer: nginx/1.5.8\r\nDate: Tue, 18 Feb 2014 06:39:45 GMT\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\n\r\n<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.5.8</center>\r\n</body>\r\n</html>\r\n"
現在,如果我嘗試在執行curl命令它的工作正常的外殼:
$ curl -i 'http://www.mixcloud.com' -v
* Adding handle: conn: 0x7fe28b004000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fe28b004000) send_pipe: 1, recv_pipe: 0
* About to connect() to www.mixcloud.com port 80 (#0)
* Trying 46.23.65.210...
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0)
> GET/HTTP/1.1
> User-Agent: curl/7.30.0
> Host: www.mixcloud.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 18 Feb 2014 06:41:30 GMT
Date: Tue, 18 Feb 2014 06:41:30 GMT
< Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
< Content-Length: 194847
Content-Length: 194847
< Connection: keep-alive
Connection: keep-alive
< Vary: Accept-Encoding
Vary: Accept-Encoding
* Server gunicorn/0.17.4 is not blacklisted
< Server: gunicorn/0.17.4
Server: gunicorn/0.17.4
< Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block
Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block
< x-xss-protection: 1; mode=block
x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
x-content-type-options: nosniff
< Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/
Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/
< Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
<
<!DOCTYPE html> ...
我知道PHP和cURL的cURL是不同的,但我看不到我缺少的東西。 有人嗎?
乾杯, 馬克西姆
一個明顯的區別是Referer請求標題,但我不確定是否有任何服務器會爲此而煩惱。 –
是的,但沒有...我只是試圖添加一些頭信息,認爲nginx正在檢測一個機器人。 – maxwell2022