2013-07-15 61 views
3

我有一些HTML如下:使用正則表達式(或任何其他方式)一致的基礎HTML

<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven) 
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four) 
    ... 
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four) 

現在,使用php,我想分開這一點,並讓兩個數組如下:

陣列1 - (這在<b>標籤將擁有一切)

[0] -> <b>This is a title: </b> 
    [1] -> <b>Some more text: </b> 
    ... 
    [n] -> <b>Hello world!: </b> 

陣列2 - (這將擁有一切<b>標籤)

[0] -> 0091 + Two + 423 + Four + (Five, Six, Seven) 
    [1] -> Abc + Hi + Random + Text + (Hello, 522, Four) 
    ... 
    [n] -> Test + Foo + 1122 + (120, 122, Four) 

我試圖用正則表達式和preg_match_all,但我似乎無法推測出來。任何幫助將不勝感激。

謝謝!

+1

**不要使用正則表達式來解析HTML **。你不能用正則表達式可靠地解析HTML,你將面臨悲傷和挫折。只要HTML從你的期望改變,你的代碼就會被破壞。有關如何使用已經編寫,測試和調試的PHP模塊正確解析HTML的示例,請參閱http://htmlparsing.com/php。 –

+0

嗨,安迪!我已經在使用'simple_html_dom'庫(在你發佈的鏈接中提到過)。特別是我在這段時間裏遇到了困難,我決定去正則表達式路線。只是爲了這種情況。否則,我正在使用HTML解析器庫。感謝您的輸入:) – wiseindy

回答

1
<?php 
$string = ' <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven) 
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four) 
    ... 
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)'; 
preg_match_all("#(<b>[^<]+<\/b>)([^<]+)#", $string, $matches); 
print_r($matches); 
?> 

輸出:

Array 
(
    [0] => Array 
     (
      [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven) 

      [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four) 
    ... 

      [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four) 
     ) 

    [1] => Array 
     (
      [0] => <b>This is a title: </b> 
      [1] => <b>Some more text: </b> 
      [2] => <b>Hello world!: </b> 
     ) 

    [2] => Array 
     (
      [0] => 0091 + Two + 423 + Four + (Five, Six, Seven) 

      [1] => Abc + Hi + Random + Text + (Hello, 522, Four) 
    ... 

      [2] => Test + Foo + 1122 + (120, 122, Four) 
     ) 

) 
+0

Akram,這工作完美!非常感謝。 :)任何好的建議,我可以瞭解更多關於正則表達式? – wiseindy

+0

只是嘗試自己做,正則表達式很好學習...搜索谷歌 – 2013-07-16 21:55:38

1

你可以試試這個:

<pre> 
<?php 

$subject =<<<LOD 
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven) 
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four) 
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four) 
LOD; 

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~'; 
preg_match_all($pattern, $subject, $matches); 

array_shift($matches); 
array_walk_recursive($matches,function (&$val){ $val=trim($val); }); 
list($array1, $array2) = $matches; 

print_r($array1); 
print_r($array2); 
相關問題