2011-05-11 50 views
0

我想抓住開放標籤的匹配。遇到包含子標籤的父標籤打開的問題。父標籤被捕獲,但它忽略了兒童標籤。PHP preg_match_all重新分配模式

ex。

</p> 
<p>hello world</p> 
<p><img 

preg_match_all('/<(\/?[a-z]+)[^>]*\/?>/i', $trimmed_text, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); 

會給下面的輸出:

Array 
(
[0] => Array 
    (
     [0] => Array 
      (
       [0] => 


       [1] => 0 
      ) 

     [1] => Array 
      (
       [0] => /p 
       [1] => 1 
      ) 

    ) 

[1] => Array 
    (
     [0] => Array 
      (
       [0] => 

[1] => 5) [1] => Array ([0] => p [1] => 6)) [2] => Array ([0] => Array ([0] => 

       [1] => 19 
      ) 

     [1] => Array 
      (
       [0] => /p 
       [1] => 20 
      ) 

    ) 

[3] => Array 
    (
     [0] => Array 
      (
       [0] => 

[1] => 24) [1] => Array ([0] => p [1] => 25))) 

是否有可能在父所有打開的標籤有一個子集陣列?

+4

而這裏也正是爲什麼你不解析HTML/XML與正則表達式。使用DOM而不是找到你想要的。 – 2011-05-11 19:22:22

+0

我推薦你去http://php.net/manual/en/book.tidy.php – 2011-06-12 07:54:01

回答

1

你這樣做硬盤的方式,使用PHP Simple HTML DOM Parser解析HTML,

例如:

// Create DOM from URL or file 
include('simple_html_dom.php'); 
$html = file_get_html('http://www.scroogle.org/'); 

// Find all images 
foreach($html->find('img') as $element) 
     echo $element->src . '<br>'; 

// Find all links 
foreach($html->find('a') as $element) 
     echo $element->href . '<br>';