2012-12-13 40 views
1

嗨,我有我想要的捲曲度解析,在這裏域是這樣:cUrl作者:不含http域:// WWW

當我在域http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201

去重定向我[ register.metsad.ee/avalik/info_teatis.php?too_id=2942704201]

它沒有http:// www。 代碼我用它來解析是:

function get_data($url) { 
     $ch = curl_init(); 
     $timeout = 5; 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
     curl_setopt($ch, CURLOPT_MAXREDIRS, 10); 
     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
     $data = curl_exec($ch); 
     curl_close($ch); 
     return $data; 
    } 
$src = 'http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201'; 

然後$c = get_data($src); echo $c; 對於resoult我得到一個空白頁。我也試圖與Simple_Html_Dom解析器是這樣的:

echo file_get_html($src)->plaintext;

但我仍得到一個空白頁。當我TRIE不含http解析://再有就是

Warning: file_get_contents(register.metsad.ee/avalik/info_teatis.php?too_id=2942704201) [function.file-get-contents]: failed to open stream: Result too large in C:\xampp\htdocs\Trash\metsakontroll\system\c_simple_html_dom.php on line 70

cUrl作者還提供了白色畫面,沒有效果的錯誤。當我試圖解析它像這樣的文件夾:

http://www.metsad.ee/register/avalik/info_teatis.php?too_id=2942704201然後服務器說未找到

我找遍了整個互聯網= /任何想法如何通過捲曲或Simple_html_dom閱讀網頁?

+0

試'http://register.metsad.ee/avalik/info_teatis.php?too_id = 2942704201' –

+0

我得到空白的白色屏幕導致它沒有重定向=( 編輯:如果我捲曲頭,然後我得到這個 HTTP/1.1 200 OK日期:星期四,2012年12月13日20點12分27秒格林尼治標準時間服務器:Apache的Content-Length:0的Content-Type:text/html的;字符集= UTF-8 這意味着它連接到服務器,但作爲文件長度爲0,它返回我一個空白頁,它不會重定向到地址,而HTTP:// –

回答

1

register.metsad.ee方面有某種保護。 Thay返回空響應,直到User-Agent標題被設置。

呼叫失敗(空響應):

[email protected]:~$ telnet register.metsad.ee 80 
Trying 213.184.43.115... 
Connected to register.metsad.ee. 
Escape character is '^]'. 
GET /avalik/info_teatis.php?too_id=2942704201 HTTP/1.1 
Host: register.metsad.ee 

HTTP/1.1 200 OK 
Date: Thu, 13 Dec 2012 20:07:11 GMT 
Server: Apache 
Content-Length: 0 
Content-Type: text/html; charset=UTF-8 

全成調用(HTML頁面返回):

[email protected]:~$ telnet register.metsad.ee 80 
GET http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201 HTTP/1.1 
Host: register.metsad.ee 
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0 

HTTP/1.1 200 OK 
Date: Thu, 13 Dec 2012 20:13:07 GMT 
Server: Apache 
Expires: Thu, 19 Nov 1981 08:52:00 GMT 
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 
Pragma: no-cache 
Set-Cookie: SNS=a0e425c2aec17c38be3716b366f75749; path=/ 
Transfer-Encoding: chunked 
Content-Type: text/html; charset=UTF-8 

762 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
... 

所以你需要到下一行添加到:

curl_setopt($ch, So you need to add CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0"); for example (or any other user agent string). 
+0

謝謝!嗯,我補充說,部分'curl_setopt($ CH,CURLOPT_USERAGENT, 'Mozilla的/ 5.0(窗口; U; Windows NT的5.1; EN-US; RV:1.8.1.13)的Gecko/20080311 Firefox的/ 2.0.0.13');'和有效! PS:這是我在Stackoverflow上的第一個問題,我在15分鐘內得到了答案,哇愛你們所有人:P! –

+0

這是一個有趣的案例:) –

相關問題