2016-09-30 77 views
1

我想抓取一個網頁,並從它解析一些數據。但每次我嘗試刮我只得到http響應標題。這裏是我的代碼,我以前從網站獲取數據..如何使用PHP刮取網頁?

$host = 'Host: dealnews.com'; 
$user_agent = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0'; 
$accept = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'; 
$accept_language = 'Accept-Language: en-US,en;q=0.5'; 
$accept_encoding = 'Accept-Encoding: gzip, deflate'; 
$connection = 'Connection: keep-alive'; 
$cookie = 'Cookie=front_page_sort=hotness; dnvta=%7B%22uid%22%3A%22VkA1VlBBb0tNcXdBQVF6UlJrTUFBQUJN%22%2C%22vid%22%3A%22VkA1bGx3b0tNcXdBQVF6bW53QUFBQUEt%22%2C%22fvts%22%3A1475237180%2C%22lvts%22%3A1475241453%2C%22ref%22%3A%22%22%2C%22usid%22%3A0%2C%22ct%22%3A2%2C%22cr%22%3A1475237180%7D; last_visit=1475241457; _ceg.s=oebjle; _ceg.u=oebjle; _ga=GA1.2.185245695.1475237222; __gads=ID=1921ec3c3fe54b1b:T=1475237222:S=ALNI_MZJZEuNpmg3Aq5e007E7iFjwuQ0nw; original_eref=DIRECT; _gat=1; mp_dealnews_mixpanel=%7B%22distinct_id%22%3A%20%221577afe52c549-01b1cfdcc8ca548-13666c4a-100200-1577afe52c620c%22%7D'; 

$requestHeaders = array ($host, $user_agent, $accept, $accept_encoding, $accept_language, $connection, $cookie); 

$ch = curl_init ('http://dealnews.com/2-LED-Window-Candles-w-Color-Changing-Bulbs-for-4-2-s-h/1797165.html?iref=rss-dealnews-todays-edition'); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 100); 
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 100); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, false); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt ($ch, CURLOPT_HEADER, TRUE); 
curl_setopt ($ch, CURLOPT_ENCODING, "gzip"); 
curl_setopt ($ch, CURLOPT_HTTPHEADER, $requestHeaders); 
$data = curl_exec ($ch); 
if (! $data) { 
    die ("Error: " . curl_error ($ch) . " Error no: " . curl_errno ($ch)); 
} 
curl_close ($ch); 
$htmlContent = str_get_html ($data); 
echo $htmlContent; 

但下面給出它給我的錯誤..

HTTP/1.1 302 Found Date: Fri, 30 Sep 2016 13:50:44 GMT Server: Apache X-Powered-By: PHP/5.5.9-1ubuntu4.19 Status: 302 Found Location: /lw/landing.html?uri=%2F2-LED-Window-Candles-w-Color-Changing-Bulbs-for-4-2-s-h%2F1797165.html%3Firef%3Drss-dealnews-todays-edition Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 20 X-Cnection: close Content-Type: text/html; charset=utf-8 

所以有人可以幫助我在哪裏我腳麻在此

+3

這是一個重定向。啓用'CURLOPT_FOLLOWLOCATION'選項。 – Barmar

+0

哦,是啊!謝謝@Barmar –

+0

[這個debug-verbose-info是什麼意思?]可能是重複的(http://stackoverflow.com/questions/39533450/what-does-this-debug-verbose-info-mean) – Henders

回答

1

您需要

curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); 

302是一個再導向等信息。

0

如果您正在使用PHP尋找ScreenScrape,我已經成功地使用了PHP Simple HTML DOM Parser libarary。它非常簡單易用。我知道這個網站看起來有些陳舊,但我去年的代碼仍然運行良好。還沒有CRON錯誤。