拆分使用正則表達式來獲得關鍵值對

我解析文本，但我不能獲得時的空間缺少件（這是OK）
編輯的字符串的數組：我已添加冒號到自由文本。
編輯：好吧，這是一個可以寫入鍵值對的任意文本格式。丟棄元素[0]，數組中的其餘元素產生一系列鍵值。它接受多行值。拆分使用正則表達式來獲得關鍵值對

這是測試用例文本：

:part1 only one \s removed:OK 
:part2 :text :with 
new lines 
on it 
:noSpaceAfterThis 
:thisShoudBeAStandAlongText but: here there are more text 
:part4 :even more text

這就是我想要的：

Array 
(
    [0] => 
    [1] => part1 
    [2] => only one \s removed:OK 
    [3] => part2 
    [4] => :text :with 
new lines 
on it 
    [5] => noSpaceAfterThis 
    [6] => 
    [7] => thisShoudBeAStandAlongText 
    [8] => but: here there are more text 
    [9] => part4 
    [10] => :even more text 
)

這就是我得到：

Array 
(
    [0] => 
    [1] => part1 
    [2] => only one \s removed:OK 
    [3] => part2 
    [4] => :text :with 
new lines 
on it 
    [5] => noSpaceAfterThis 
    [6] => :thisShoudBeAStandAlongText but: here there are more text 
    [7] => part4 
    [8] => :even more text 
)

這是我的測試代碼：

<?php 
$text = ' 
:part1 only one \s removed:OK 
:part2 :text :with 
new lines 
on it 
:noSpaceAfterThis 
:thisShoudBeAStandAlongText but: here there are more text 
:part4 :even more text'; 

echo '<pre>'; 
// my effort so far: 
$ret = preg_split('|\r?\n:([\w\d]+)(?:\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
print_r($ret); 

// nor this one: 
$ret = preg_split('|\r?\n:([\w\d]+)\r?\s?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
print_r($ret); 

// for debuging, an extra capturing group 
$ret = preg_split('|\r?\n:([\w\d]+)(\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
var_dump($ret);

來源

2013-07-15 Luis Siquot

這將幫助別人知道規則是什麼，即爲什麼要匹配你想要的方式？ :) –

@傑克你是正確的，我希望你同意我所做的編輯。 –

與preg_match_all的另一種方法：

$pattern = '~(?<=^:|\n:)\S++|(?<=\s)(?:[^:]+?|(?<!\n):)+?(?= *+(?>\n:|$))~'; 
preg_match_all($pattern, $text, $matches); 
echo '<pre>' . print_r($matches[0], true);

模式的細節：

# capture all the first word at line begining preceded by a colon # 
(?<=^:|\n:)  # lookbehind, preceded by the begining of the string 
        # and a colon or a newline and a colon 
\S++    # all that is not a space 

# capture all the content until the next line with : at first position # 
(?<=\s)   # lookbehind, preceded by a space 
(?:    # open a non capturing group 
    [^:]+?   # all character that is not a colon, one or more times (lazy) 
    |    # OR 
    (?<!^|\n):  # negative lookbehind, a colon not preceded by a newline 
        # or the begining of the string 
)+?    # close the non capturing group, 
        #repeat one or more times (lazy) 
(?= *+(?>\n:|$)) # lookahead, followed by spaces (zero or more) and a newline 
        # with colon at first position or the end of the string

這裏的優點是避免了無效的結果。

或使preg_split：

$res = preg_split('~(?:\s*\n|^):(\S++)(?:)?~', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

說明：

的目標是將文字一分爲兩種情況：

當第一個字符是:
在當行開始時的第一個空格:

因此，在開始一行時，兩個分裂點位於這個:word附近。 :必須刪除後的空格，但必須保留該單詞。這就是我使用PREG_SPLIT_DELIM_CAPTURE保留單詞的原因。

圖案的詳細資料：

(?:   # non capturing group (all inside will be removed) 
    \s*\n  # trim the spaces of the precedent line and the newline 
    |   # OR 
^  # it is the begining of the string 
)    # end of the non capturing group 
:    # remove the first character when it is a : 
(\S++)  # keep the first word with DELIM_CAPTURE 
(?:)?  # remove the first space if present

來源

2013-07-15 01:26:53

@LuisSotot：Oups！對不起，錯字。謝謝 –

對不起。我編輯了我的問題，拆分冒號必須在行的開頭 –

@LuisSiquot：不是問題，請嘗試我的編輯。 –

拆分使用正則表達式來獲得關鍵值對

回答

相關問題