我怎麼能找到除引號之間的所有空格？

我需要用空格拆分字符串，但引號中的短語應保留爲未拆分。例如：我怎麼能找到除引號之間的所有空格？

word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5

這將導致陣列使preg_split後：

array(
[0] => 'word1', 
[1] => 'word2', 
[2] => 'this is a phrase', 
[3] => 'word3', 
[4] => 'word4', 
[5] => 'this is a second phrase', 
[6] => 'word5' 
)

我應該如何撰寫我的正則表達式來做到這一點？

PS。有related question，但我認爲它不適用於我的情況。接受的答案提供正則表達式來查找單詞而不是空白。

來源

2009-11-12 altern

那相關的問題看起來正是你想做的事，根據您既給出的例子是什麼。你有沒有嘗試接受的答案？發生了什麼？ – richsage 2009-11-12 12:47:55

是的，我試過了。我使用PHP，而不是.NET。我無法使用正則表達式結果的內聯過濾。而且，正如我所說的，\ w + |「[\ w \ s] *」對我來說不起作用 – altern 2009-11-12 12:50:29

隨着用戶MizardX從#regex IRC頻道的幫助（irc.freenode.net ）解決方案被發現。它甚至支持單引號。

$str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5'; 

$regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/'; 

$arr = preg_split($regexp, $str); 

print_r($arr);

結果是：

Array (
    [0] => word1 
    [1] => word2 
    [2] => 'this is a phrase' 
    [3] => word3 
    [4] => word4 
    [5] => "this is a second phrase" 
    [6] => word5 
    [7] => word1 
    [8] => word2 
    [9] => "this is a phrase" 
    [10] => word3 
    [11] => word4 
    [12] => "this is a second phrase" 
    [13] => word5 
)

PS。唯一的缺點是，這個正則表達式只適用於PCRE 7.

事實證明，我沒有生產服務器上的PCRE 7支持，只有PCRE 6安裝在那裏。雖然它並不像以前的一個PCRE 7，正則表達式，將工作是（擺脫\ G和\的K）靈活：

/(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/

對於給定的輸入結果同上。

來源

2009-11-12 13:04:29 altern

什麼\ G和\ķ立場？ – Amarghosh 2009-11-12 13:38:05

'\ G'錨定匹配到以前的比賽結束了（粗略地講）的地方，或者輸入的開始。如果沒有前面的比賽。我不得不擡頭看：這意味着「假裝比賽真的從這裏開始」;雖然正則表達式匹配一個標記和它後面的空白，但它表現得像只匹配空白。在窮人的背後看來，只有在大多數情況下，它看起來要優於後視。爲什麼不是那種特徵更常見，我想知道？ http://www.pcre.org/pcre.txt – 2009-11-12 15:15:08

謝謝Alan。無法在regex.info中找到兩者...並且對於正則表達式來說很難Google。 – Amarghosh 2009-11-12 16:41:16

假設您的報價已定義好，即成對出現，您可以爆炸並通過循環每2個字段。例如

$str = "word1 word2 \"this is a phrase\" word3 word4 \"this is a second phrase\" word5 word6 \"lastword\""; 
print $str ."\n"; 
$s = explode('"',$str); 
for($i=1;$i<count($s);$i+=2){ 
    if (strpos($s[$i] ," ")!==FALSE) { 
     print "Spaces found: $s[$i]\n"; 
    } 
}

輸出

$ php test.php 
Spaces found: this is a phrase 
Spaces found: this is a second phrase

不需要複雜的正則表達式。

來源

2009-11-12 12:57:41 ghostdog74

當然，我可以在沒有正則表達式的情況下做到這一點，但這不是我的情況。 – altern 2009-11-12 13:01:47

使用來自其他問題的正則表達式鏈接這是相當容易的？

<?php 

$string = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5'; 

preg_match_all('/(\w+|"[\w\s]*")+/' , $string , $matches); 

print_r($matches[1]); 

?>

輸出：

Array 
(
    [0] => word1 
    [1] => word2 
    [2] => "this is a phrase" 
    [3] => word3 
    [4] => word4 
    [5] => "this is a second phrase" 
    [6] => word5 
)

來源

2009-11-12 13:02:05 edds

那也應該找到特殊字符（例如＆符號）呢？不僅是＆符號將被處理。而且，不同的符號應該被不同地處理。例如，如果遇到大括號，我需要將它們包含在搜索結果中。 – altern 2009-11-12 13:09:38

@altern，好了，我敢肯定'edds'不介意你調整自己的例子，您的需求... – 2009-11-12 13:16:00

有人想基準標記化與正則表達式？我的猜測是explode（）函數對於任何速度優勢來說有點過於沉重。儘管如此，這裏的另一種方法：

（編輯，因爲我忘了用於存儲引用字符串的其他情況）

$str = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5'; 

// initialize storage array 
$arr = array(); 
// initialize count 
$count = 0; 
// split on quote 
$tok = strtok($str, '"'); 
while ($tok !== false) { 
    // even operations not in quotes 
    $arr = ($count % 2 == 0) ? 
           array_merge($arr, explode(' ', trim($tok))) : 
           array_merge($arr, array(trim($tok))); 
    $tok = strtok('"'); 
    ++$count; 
} 

// output results 
var_dump($arr);

來源

2009-11-12 13:03:39

$test = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5'; 
preg_match_all('/([^"\s]+)|("([^"]+)")/', $test, $matches);

來源

2009-11-12 13:11:16 Amarghosh

我怎麼能找到除引號之間的所有空格？

回答

相關問題