0
我刮此頁:
http://kat.ph/search/example/?field=seeders&sorder=desc使用PHP的DOM文檔:: preserveWhiteSpace = false,並仍然得到空白
這樣:
...
curl_setopt($curl, CURLOPT_URL, $url);
$header = array (
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding:gzip,deflate,sdch',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:kat.ph',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, 'http://kat.ph');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate,sdch');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$html = curl_exec($curl);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTML($html);
(只好模仿瀏覽器這工作,因此CURL)
但我仍然得到DOMNodes
類型#text
其中只包含空白字符。
任何想法,爲什麼會發生這種情況,以及如何避免它?
非常好!感謝您的洞察力,我將在未來記住這一點。 – flu 2012-05-24 09:28:42
其中兩個鏈接已經死亡。 – ow3n 2013-10-09 03:50:03