2013-10-13 58 views
0

我努力實踐捲曲,但並不順利 Pleasw告訴我,什麼是錯的 這裏是我的代碼如何使用捲曲的preg_match _all DIV內容

<?php 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, "http://xxxxxxx.com/"); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_USERAGENT, "Google Bot"); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

$downloaded_page = curl_exec($ch); 
curl_close($ch); 
preg_match_all('/<div\s* class =\"abc\">(.*)<\/div>/', $downloaded_page, $title); 
echo "<pre>"; 
print($title[1]); 
echo "</pre>"; 

和警告Notice: Array to string conversion

我要解析HTML是這樣

<div class="abc"> 
<ul> blablabla </ul> 
<ul> blablabla </ul> 
<ul> blablabla </ul> 
</div> 
+0

$標題不是一個數組,但數組的數組。查看手冊頁上的示例:http://php.net/manual/en/function.preg-match-all.php – Ashalynd

回答

1

preg_match_all返回一個數組數組。

如果你的代碼是:

preg_match_all('/<div\s+class="abc">(.*)<\/div>/', $downloaded_page, $title); 

實際上要做到以下幾點:

echo "<pre>"; 
foreach ($title[1] as $realtitle) { 
    echo $realtitle . "\n"; 
} 
echo "</pre>"; 

,因爲它會搜索所有div的是具有類 「ABC」。我也建議你加強你的正則表達式,使之更加健壯。

preg_match_all('/<div[^>]+class="abc"[^>]*>(.*)<\/div>/', $downloaded_page, $title); 

這將匹配以及

BTW:的DOMDocument緩慢的地獄,我發現有時正則表達式(這取決於你的文檔的大小)可以給40倍的速度增加。只是保持簡單。

最佳, 尼古拉斯

1

Don't parse HTML with regex.

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, 'http://www.lipsum.com/'); 
curl_setopt($ch, CURLOPT_HEADER, false); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
$html = curl_exec($ch); 
curl_close($ch); 

$dom = new DOMDocument; 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 
# foreach ($xpath->query('//div') as $div) { // all div's in html 
foreach ($xpath->query('//div[contains(@class, "abc")]') as $div) { // all div's that have "abc" classname 
    // $div->nodeValue contains fetched DIV content 
}