2013-07-15 67 views
2

我解析文本,但我不能獲得時的空間缺少件(這是OK)
編輯的字符串的數組:我已添加冒號到自由文本。
編輯:好吧,這是一個可以寫入鍵值對的任意文本格式。丟棄元素[0],數組中的其餘元素產生一系列鍵值。它接受多行值。拆分使用正則表達式來獲得關鍵值對

這是測試用例文本:

:part1 only one \s removed:OK 
:part2 :text :with 
new lines 
on it 
:noSpaceAfterThis 
:thisShoudBeAStandAlongText but: here there are more text 
:part4 :even more text 

這就是我想要的:

Array 
(
    [0] => 
    [1] => part1 
    [2] => only one \s removed:OK 
    [3] => part2 
    [4] => :text :with 
new lines 
on it 
    [5] => noSpaceAfterThis 
    [6] => 
    [7] => thisShoudBeAStandAlongText 
    [8] => but: here there are more text 
    [9] => part4 
    [10] => :even more text 
) 

這就是我得到:

Array 
(
    [0] => 
    [1] => part1 
    [2] => only one \s removed:OK 
    [3] => part2 
    [4] => :text :with 
new lines 
on it 
    [5] => noSpaceAfterThis 
    [6] => :thisShoudBeAStandAlongText but: here there are more text 
    [7] => part4 
    [8] => :even more text 
) 

這是我的測試代碼:

<?php 
$text = ' 
:part1 only one \s removed:OK 
:part2 :text :with 
new lines 
on it 
:noSpaceAfterThis 
:thisShoudBeAStandAlongText but: here there are more text 
:part4 :even more text'; 

echo '<pre>'; 
// my effort so far: 
$ret = preg_split('|\r?\n:([\w\d]+)(?:\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
print_r($ret); 

// nor this one: 
$ret = preg_split('|\r?\n:([\w\d]+)\r?\s?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
print_r($ret); 

// for debuging, an extra capturing group 
$ret = preg_split('|\r?\n:([\w\d]+)(\r?\s)?|i', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 
var_dump($ret); 
+0

這將幫助別人知道規則是什麼,即爲什麼要匹配你想要的方式? :) –

+0

@傑克你是正確的,我希望你同意我所做的編輯。 –

回答

3

與preg_match_all的另一種方法:

$pattern = '~(?<=^:|\n:)\S++|(?<=\s)(?:[^:]+?|(?<!\n):)+?(?= *+(?>\n:|$))~'; 
preg_match_all($pattern, $text, $matches); 
echo '<pre>' . print_r($matches[0], true); 

模式的細節:

# capture all the first word at line begining preceded by a colon # 
(?<=^:|\n:)  # lookbehind, preceded by the begining of the string 
        # and a colon or a newline and a colon 
\S++    # all that is not a space 

# capture all the content until the next line with : at first position # 
(?<=\s)   # lookbehind, preceded by a space 
(?:    # open a non capturing group 
    [^:]+?   # all character that is not a colon, one or more times (lazy) 
    |    # OR 
    (?<!^|\n):  # negative lookbehind, a colon not preceded by a newline 
        # or the begining of the string 
)+?    # close the non capturing group, 
        #repeat one or more times (lazy) 
(?= *+(?>\n:|$)) # lookahead, followed by spaces (zero or more) and a newline 
        # with colon at first position or the end of the string 

這裏的優點是避免了無效的結果。

或使preg_split:

$res = preg_split('~(?:\s*\n|^):(\S++)(?:)?~', $text, -1, PREG_SPLIT_DELIM_CAPTURE); 

說明:

的目標是將文字一分爲兩種情況:

    上換行符
  • 當第一個字符是:
  • 在當行開始時的第一個空格:

因此,在開始一行時,兩個分裂點位於這個:word附近。 :必須刪除後的空格,但必須保留該單詞。這就是我使用PREG_SPLIT_DELIM_CAPTURE保留單詞的原因。

圖案的詳細資料:

(?:   # non capturing group (all inside will be removed) 
    \s*\n  # trim the spaces of the precedent line and the newline 
    |   # OR 
^  # it is the begining of the string 
)    # end of the non capturing group 
:    # remove the first character when it is a : 
(\S++)  # keep the first word with DELIM_CAPTURE 
(?:)?  # remove the first space if present 
+0

@LuisSotot:Oups!對不起,錯字。謝謝 –

+0

對不起。我編輯了我的問題,拆分冒號必須在行的開頭 –

+0

@LuisSiquot:不是問題,請嘗試我的編輯。 –