2013-01-13 90 views
0

我想在PHP與preg_match_all捕捉每一種在自己的組:PHP預浸匹配,所有的捕捉

  1. 的章,節或頁面
  2. ,如果它的數字(或字母有一個)指定的章節,章節或頁面。如果他們之間有一個空格應該考慮到
  3. 詞「與」,「或」

牢記我想忽略所有的書名,並在項目的數量字符串可能是動態的,正則表達式應該下面所有的例子工作:

  1. 通道1和Sect2b
  2. 章4×unwantedtitle和教派5Y不必要的標題和Sect6 z和Ch7的或CH8

這是我設法拿出這麼遠:

$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3'; 
    preg_match_all ('/([a-z]+)(?=\d|\d\s)\s*(\d*)\s*(?<=\d|\d\s)([a-z]?).*?(and|or)?/i', $str, $matches); 

    Array 
    (
     [0] => Array 
      (
       [0] => Pg3 
      ) 

     [1] => Array 
      (
       [0] => Pg 
      ) 

     [2] => Array 
      (
       [0] => 3 
      ) 

     [3] => Array 
      (
       [0] => 
      ) 

     [4] => Array 
      (
       [0] => 
      ) 

    ) 

預期的結果應該是:

Array 
    (
     [0] => Array 
      (
       [0] => Ch 1 a and 
       [1] => Sect 2b and 
       [2] => Pg3 
      ) 

     [1] => Array 
      (
       [0] => Ch 
       [1] => Sect 
       [2] => Pg 
      ) 

     [2] => Array 
      (
       [0] => 1 
       [1] => 2 
       [2] => 3 
      ) 

     [3] => Array 
      (
       [0] => a 
       [1] => b 
       [2] => 
      ) 

     [4] => Array 
      (
       [0] => and 
       [1] => and 
       [2] => 
      ) 

    ) 
+0

不確定你是否真的想用_one_ regex來做到這一點。用幾個看起來更好。 – fge

+0

@fge我怎麼能夠使用幾個正則表達式,同時仍然保持一切按正確的順序?如果你有一個例子,將不勝感激。謝謝。 – user1307016

+0

不在PHP中,我幾乎不知道它... – fge

回答

0

這是最接近我能得到:

$str = 'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3'; 
preg_match_all ('/((Ch|Sect|Pg)\s?(\d+)\s?(\w?))(.*?(and|or))?/i', $str, $matches); 


Array 
(
    [0] => Array 
     (
      [0] => Ch 1 a unwantedtitle and 
      [1] => Sect 2b unwanted title and 
      [2] => Pg3 
     ) 

    [1] => Array 
     (
      [0] => Ch 1 a 
      [1] => Sect 2b 
      [2] => Pg3 
     ) 

    [2] => Array 
     (
      [0] => Ch 
      [1] => Sect 
      [2] => Pg 
     ) 

    [3] => Array 
     (
      [0] => 1 
      [1] => 2 
      [2] => 3 
     ) 

    [4] => Array 
     (
      [0] => a 
      [1] => b 
      [2] => 
     ) 

    [5] => Array 
     (
      [0] => unwantedtitle and 
      [1] => unwanted title and 
      [2] => 
     ) 

    [6] => Array 
     (
      [0] => and 
      [1] => and 
      [2] => 
     ) 

) 
0

這是我該怎麼做的。

$arr = array(
    'Ch1 and Sect2b', 
    'Ch 1 a unwantedtitle and Sect 2b unwanted title and Pg3', 
    'Ch 4 x unwantedtitle and Sect 5y unwanted title and' . 
     ' Sect6 z and Ch7 or Ch8a', 
    'Assume this is ch1a and ch 2 or ch seCt 5c.' . 
     ' Then SECT or chA pg22a and pg 13 andor' 
); 

foreach ($arr as $a) { 
    var_dump($a); 
    preg_match_all(
    '~ 
     \b(?P<word>ch|sect|(pg)) 
     \s*(?P<number>\d+) 
     (?(2)\b| 
      \s* 
      (?P<letter>(?!(?<=\s)(?:and|or)\b)[a-z]+)? 
      \s* 
      (?:(?<=\s)(?P<cond>and|or)\b)? 
     ) 
    ~xi' 
    ,$a,$m); 
    foreach ($m as $k => $v) { 
     if (is_numeric($k) && $k !== 0) unset($m[$k]); 
     // this is for 'beautifying' the result array 
     // note that $m[0] will still return whole matches 
    } 
    print_r($m); 
} 

我不得不把pg成捕獲組,因爲我需要明確寫入的條件爲,這是,它可以被附加一個數字(帶或不帶之間的空間),但它不能被追加任何考慮頁面指示符的字母都不會有像「pg23a」中的字母。

這就是爲什麼我選擇命名每個組,並通過代碼中的內部foreach循環「美化」結果。否則,如果您選擇使用數字索引(而非命名索引),則需要跳過每個$m[2]

要顯示一個示例,請輸入$arr中最後一項的輸出。

Array 
(
    [0] => Array 
     (
      [0] => ch1a and 
      [1] => ch 2 or 
      [2] => seCt 5c 
      [3] => pg 13 
     ) 

    [word] => Array 
     (
      [0] => ch 
      [1] => ch 
      [2] => seCt 
      [3] => pg 
     ) 

    [number] => Array 
     (
      [0] => 1 
      [1] => 2 
      [2] => 5 
      [3] => 13 
     ) 

    [letter] => Array 
     (
      [0] => a 
      [1] => 
      [2] => c 
      [3] => 
     ) 

    [cond] => Array 
     (
      [0] => and 
      [1] => or 
      [2] => 
      [3] => 
     ) 

)