2014-07-22 78 views
1

我正在嘗試從網頁中提取一些數據。但問題是,而不是拉說:編碼字符時出現CURL錯誤

64 × 191 × 75 cm 

它顯示回聲作爲

64 × 191 × 75 cm 

我的代碼:

<?php 

$url = "http://www.google.co.uk" 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1;  +http://www.google.com/bot.html)"); 
curl_setopt($ch, CURLOPT_ENCODING ,""); 

$html = curl_exec($ch); 
$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 
$q_Dimensions = "//tr/td[@class='FieldTitle'][contains(.,'Dimensions of packed product (W×H×D):')]/following-sibling::td/text()"; 
$dimentionsQ = $xpath->query($q_Dimensions); 
$dimentions = $dimentionsQ->item(0)->nodeValue; 
echo $dimentions; 
exit(); 

我相信這可能是某種問題與性格編碼但無法進一步。任何幫助深表感謝。

回答

0

另外,設置charsetUTF-8header()工作也未嘗不可:

// add this on the top of your php script 
header('Content-Type: text/html; charset=utf-8'); 

$url = "google.co.uk"; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1;  +http://www.google.com/bot.html)"); 
curl_setopt($ch, CURLOPT_ENCODING ,""); 

$html = curl_exec($ch); 
$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 
$q_Dimensions = "//tr/td[@class='FieldTitle'][contains(.,'Dimensions of packed product (W×H×D):')]/following-sibling::td/text()"; 
$dimentionsQ = $xpath->query($q_Dimensions); 
$dimentions = $dimentionsQ->item(0)->nodeValue; 
echo $dimentions; // 64 × 191 × 75 cm 
exit(); 
+0

作品完美無瑕...感謝您的幫助和努力@Ghost非常感謝。保存了很多時間 –

+0

@MaharshiRaval肯定的男人,沒問題 – Ghost

0

一套用於CURLOPT_ENCODING另一個捲曲選項並將其設置爲「」,以確保它不會返回任何垃圾

curl_setopt($ch, CURLOPT_ENCODING ,""); 
+0

嗨@Anri謝謝你的回覆,但正如你可以在上面的代碼中看到的,我已經在第8行添加了該選項,但仍然給出了相同的問題。 –