2014-02-18 44 views
1

我試圖實現一個像facebook一樣的功能,當您粘貼一個鏈接時,它從頁面中獲取一些信息(h1,desc,images,...)並顯示它們。PHP curl返回403但不是shell命令

我已經面對幾個問題,我設法解決(gzip,cookies,用戶代理,...)但在這一個我不知道是什麼阻止我的請求。

有問題的鏈接是http://www.mixcloud.com

這裏是我的PHP腳本:

protected function getContent() 
{ 
    $ch = curl_init(); 
    $headers = array(
     'Accept: */*', 
     // 'Accept-Encoding: gzip,deflate,sdch', 
     // 'Accept-Language: en-US,en;q=0.8,es;q=0.6,fr;q=0.4,pt;q=0.2', 
     // 'Cache-Control: no-cache', 
     // 'Connection: keep-alive' 
    ); 

    $debug = TRUE; 

    // Set the request type 
    curl_setopt($ch, CURLOPT_VERBOSE, $debug); 
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET'); 
    curl_setopt($ch, CURLOPT_NOBODY, FALSE); 
    curl_setopt($ch, CURLOPT_URL, $this->url); 
    curl_setopt($ch, CURLOPT_USERAGENT, $this->userAgent); 
    curl_setopt($ch, CURLOPT_REFERER, $this->referrer); 
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); 
    curl_setopt($ch, CURLOPT_HEADER, $debug); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); 
    curl_setopt($ch, CURLOPT_ENCODING , 'gzip'); 
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5'); 
    curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookies.txt'); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookies.txt'); 

    $data = curl_exec($ch); 

    var_dump($data);die; 

    return curl_exec($ch); 
} 

下面是詳細的迴應:

* Adding handle: conn: 0x7f937504e400 
* Adding handle: send: 0 
* Adding handle: recv: 0 
* Curl_addHandleToPipeline: length: 1 
* - Conn 0 (0x7f937504e400) send_pipe: 1, recv_pipe: 0 
* About to connect() to www.mixcloud.com port 80 (#0) 
* Trying 46.23.65.210... 
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0) 
> GET/HTTP/1.1 
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5 
Host: www.mixcloud.com 
Accept-Encoding: gzip 
Referer: https://www.google.com.au 
Accept: */* 

< HTTP/1.1 403 Forbidden 
* Server nginx/1.5.8 is not blacklisted 
< Server: nginx/1.5.8 
< Date: Tue, 18 Feb 2014 06:39:45 GMT 
< Content-Type: text/html 
< Transfer-Encoding: chunked 
< Connection: keep-alive 
< Vary: Accept-Encoding 
< Content-Encoding: gzip 
< 
* Connection #0 to host www.mixcloud.com left intact 
string(376) "HTTP/1.1 403 Forbidden\r\nServer: nginx/1.5.8\r\nDate: Tue, 18 Feb 2014 06:39:45 GMT\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\n\r\n<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.5.8</center>\r\n</body>\r\n</html>\r\n" 

現在,如果我嘗試在執行curl命令它的工作正常的外殼:

$ curl -i 'http://www.mixcloud.com' -v 
* Adding handle: conn: 0x7fe28b004000 
* Adding handle: send: 0 
* Adding handle: recv: 0 
* Curl_addHandleToPipeline: length: 1 
* - Conn 0 (0x7fe28b004000) send_pipe: 1, recv_pipe: 0 
* About to connect() to www.mixcloud.com port 80 (#0) 
* Trying 46.23.65.210... 
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0) 
> GET/HTTP/1.1 
> User-Agent: curl/7.30.0 
> Host: www.mixcloud.com 
> Accept: */* 
> 
< HTTP/1.1 200 OK 
HTTP/1.1 200 OK 
< Date: Tue, 18 Feb 2014 06:41:30 GMT 
Date: Tue, 18 Feb 2014 06:41:30 GMT 
< Content-Type: text/html; charset=utf-8 
Content-Type: text/html; charset=utf-8 
< Content-Length: 194847 
Content-Length: 194847 
< Connection: keep-alive 
Connection: keep-alive 
< Vary: Accept-Encoding 
Vary: Accept-Encoding 
* Server gunicorn/0.17.4 is not blacklisted 
< Server: gunicorn/0.17.4 
Server: gunicorn/0.17.4 
< Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block 
Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block 
< x-xss-protection: 1; mode=block 
x-xss-protection: 1; mode=block 
< x-content-type-options: nosniff 
x-content-type-options: nosniff 
< Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/ 
Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/ 
< Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/ 
Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/ 

< 
<!DOCTYPE html> ... 

我知道PHP和cURL的cURL是不同的,但我看不到我缺少的東西。 有人嗎?

乾杯, 馬克西姆

+0

一個明顯的區別是Referer請求標題,但我不確定是否有任何服務器會爲此而煩惱。 –

+0

是的,但沒有...我只是試圖添加一些頭信息,認爲nginx正在檢測一個機器人。 – maxwell2022

回答

2

好,我找到什麼是問題。這是用戶代理。 這真的很奇怪。我用這個用戶代理:

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5 

有了這個用戶代理我用的是​​下面一個得到一個403我已經更新了它:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36 

而且它現在運作良好。我不相信人們仍然拒絕對特定用戶代理的請求...