不能preg_match以下。我究竟做錯了什麼？

我想提取具有以下描述格式的頁面的描述。即使我相信我是對的，但我不明白。不能preg_match以下。我究竟做錯了什麼？

$file_string = file_get_contents(''); 

preg_match('/<div class="description">(.*)<\/div>/i', $file_string, $descr); 
$descr_out = $descr[1]; 

echo $descr_out; 


<div class="description"> 
<p>some text here</p> 
</div>

來源

2012-07-31 John Billy

看起來你需要在你的正則表達式中打開單線模式。修改它以添加-s標誌：

preg_match('/<div class="description">(.*)<\/div>/si', $file_string, $descr);

單線模式允許。字符匹配換行符。如果沒有它，。*將不匹配換行符，你在開始和結束div標籤之間有。

來源

2012-07-31 13:42:32 kingcoyote

謝謝你的回答，但它似乎沒有任何區別 – 2012-07-31 14:03:26

這種單線模式的描述是不正確的。默認情況下，'.'與行分隔符不匹配，但當您打開單行模式時，它匹配所有內容。什麼是線條分隔符根據正則表達式的風格和設置而有所不同，但它總是包含換行符（'\ n'，LF）。（有些口味稱之爲DOTALL模式，而不是單行模式，這更好的IMO。） – 2012-07-31 15:08:04

艾倫，你是絕對正確的。我已經更新了我的答案，以免誤導。 – kingcoyote 2012-07-31 15:49:34

我會建議使用DOMDocument類和xpath提取從HTML文檔中隨機件，基於正則表達式的解決方案是在不斷變化的輸入（在陌生的地方增加額外的屬性，空格等）很脆，它的可讀性更復雜場景。

$html = '<html><body><div class="description"><p>some text here</p></div></body></html>'; 
// or you could fetch external sites 
// $html = file_get_contents('http://example.com'); 

$doc = new DOMDocument(); 
// prevent parsing errors (frequent with HTML) 
libxml_use_internal_errors(true); 
$doc->loadHTML($html); 
// enable back parsing errors as the HTML document is already parsed and stored in $doc 
libxml_use_internal_errors(false); 
$xpath = new DOMXpath($doc); 

foreach ($xpath->query('//div[@class="description"]') as $el) { 
    var_dump($el->textContent); 
}

來源

2012-07-31 13:51:25 complex857

什麼是使用URL的正確代碼？ – 2012-07-31 13:58:13

您在何處獲得'loadHTML'的輸入字符串並不重要，您可以像往常一樣使用'curl'或'file_get_contents'。它會嘗試加載格式不正確的html（可能會生成警告） – complex857 2012-07-31 14:14:02

不能preg_match以下。我究竟做錯了什麼？

回答

相關問題