2011-01-24 103 views
10

我需要分割一個字符串,如下所示,基於空格作爲分隔符。但是報價中的任何空間都應該保留。正則表達式分割字符串保留引號

research library "not available" author:"Bernard Shaw" 

research 
library 
"not available" 
author:"Bernard Shaw" 

我試圖做這在C夏普,我有這樣的正則表達式:@"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*"""從另一篇文章中SO,其將字符串轉換成

research 
library 
"not available" 
author 
"Bernard Shaw" 

這不幸的是不符合我的確切要求。

我正在尋找任何正則表達式,這將做的伎倆。

任何幫助表示讚賞。

回答

25

只要有可能沒有逃脫引號引用的字符串中,以下應該工作:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"); 

上的空格字符此正則表達式分裂,只有當他們被偶數報價的之前和之後。

沒有所有這些正則表達式轉義引號,解釋說:

(?<=  # Assert that it's possible to match this before the current position (positive lookbehind): 
^  # The start of the string 
[^"]* # Any number of non-quote characters 
(?:  # Match the following group... 
    "[^"]* # a quote, followed by any number of non-quote characters 
    "[^"]* # the same 
)*  # ...zero or more times (so 0, 2, 4, ... quotes will match) 
)   # End of lookbehind assertion. 
[ ]  # Match a space 
(?=  # Assert that it's possible to match this after the current position (positive lookahead): 
(?:  # Match the following group... 
    [^"]*" # see above 
    [^"]*" # see above 
)*  # ...zero or more times. 
[^"]* # Match any number of non-quote characters 
$  # Match the end of the string 
)   # End of lookahead assertion 
+0

如何分割它帶點,問號,感嘆號等等而不是空格。除了引號內容外,我試圖逐句讀出每個句子。例如:走了。 **回頭了**但是爲什麼? **並說:「你好,世界,該死的弦分裂的東西!」沒有恥辱** – ErTR 2016-01-26 00:25:21

+1

@ErtürkÖztürk:這是值得它自己的StackOverflow問題 - 太大而無法在評論中回答。 – 2016-01-26 07:12:10

3

在這裏你去:

C#:

Regex.Matches(subject, @"([^\s]*""[^""]+""[^\s]*)|\w+") 

正則表達式:

([^\s]*\"[^\"]+\"[^\s]*)|\w+ 
相關問題