2013-05-08 50 views
-3

我需要根據自定義html標記分割我的html。正則表達式根據自定義標記分割一些html內容

這是我的HTML看起來像:

<div> 
    <div id="header"> 
     <h1>Document Title</h1> 
    </div> 

    <div id="content"> 
     <p>Lorem ipsum dolar sit</p> 
     <magicheader type="2" class="someClass">Header</magicheader> 
     <p>Lorem ipsum dolar sit</p> 
     <span><magicheader type="3" class="someClass">Header</magicheader></span> 
    </div> 

    <div id="footer"> 

    </div> 
</div> 

這就是我需要:

Array 
(
    [0] => <div> 
    <div id="header"> 
     <h1>Document Title</h1> 
    </div> 

    <div id="content"> 
     <p>Lorem ipsum dolar sit</p> 
    [1] => <magicheader type="2" class="someClass">Header</magicheader> 
    [2] => <p>Lorem ipsum dolar sit</p> 
     <span> 
    [3] => <magicheader type="3" class="someClass">Header</magicheader> 
    [4] => </span> 
    </div> 

    <div id="footer"> 

    </div> 
</div> 
) 

任何人可以幫助我的模式?

+4

[正則表達式不能解析HTML(http://stackoverflow.com/questions/1732348/regex- match-open-tags-except-xhtml-self-contained-tags) – jbabey 2013-05-08 12:51:35

+0

似乎沒有任何模式可以分解HTML。你能解釋一下你提到分裂作品背後的想法嗎? – arijeet 2013-05-08 12:51:39

+0

說正則表達式不能截斷HTML是錯誤的,但很準確地說,正則表達式不能可靠和準確地解析HTML。除非你試圖對一個特定的有限問題進行快速和骯髒的修復,否則這不是一個明智的做法。即使那樣,通常也有更好/更合適的解決方案。 – 2013-05-08 13:05:08

回答

1

您需要使用preg_splitPREG_SPLIT_DELIM_CAPTURE

$text=<<<EOD 
<div> 
    <div id="header"> 
     <h1>Document Title</h1> 
    </div> 

    <div id="content"> 
     <p>Lorem ipsum dolar sit</p> 
     <magicheader type="2" class="someClass">Header</magicheader> 
     <p>Lorem ipsum dolar sit</p> 
     <span><magicheader type="3" class="someClass">Header</magicheader></span> 
    </div> 

    <div id="footer"> 

    </div> 
</div> 
EOD; 

$regexp = '%(<magicheader [^>]*>Header</magicheader>)%'; 
$value = preg_split($regexp, $text, -1, PREG_SPLIT_DELIM_CAPTURE); 

然後print_r($value)輸出:

Array 
(
    [0] => <div> 
    <div id="header"> 
     <h1>Document Title</h1> 
    </div> 

    <div id="content"> 
     <p>Lorem ipsum dolar sit</p> 

    [1] => <magicheader type="2" class="someClass">Header</magicheader> 
    [2] => 
     <p>Lorem ipsum dolar sit</p> 
     <span> 
    [3] => <magicheader type="3" class="someClass">Header</magicheader> 
    [4] => </span> 
    </div> 

    <div id="footer"> 

    </div> 
</div> 
)