2011-11-17 87 views
11

我目前正在努力想出一個正則表達式,它可以將字符串拆分爲單詞,其中單詞被定義爲由空格包圍的字符序列,或者包含在雙引號之間。我使用String#scanRuby正則表達式提取字

例如,字符串:使用

hello 
my name 
is 
Tom 

我設法匹配雙引號括起來的話:

' hello "my name" is "Tom"' 

應匹配的話

/"([^\"]*)"/ 

但我不知道如何將包圍空白字符得到'你好','是'和'湯姆',同時不要搞亂'我的名字'。

任何幫助,將不勝感激!

回答

23
result = ' hello "my name" is "Tom"'.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/) 

將爲您工作。它會打印

=> ["", "hello", "\"my name\"", "is", "\"Tom\""] 

只是忽略空字符串。

說明

" 
\\s   # Match a single character that is a 「whitespace character」 (spaces, tabs, and line breaks) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(?=   # Assert that the regex below can be matched, starting at this position (positive lookahead) 
    (?:   # Match the regular expression below 
     [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
     \"    # Match the character 「\"」 literally 
     [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
     \"    # Match the character 「\"」 literally 
    )*   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
    [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
    \$    # Assert position at the end of a line (at the end of the string or before a line break character) 
) 
" 

您可以使用reject這樣避免空字符串

result = ' hello "my name" is "Tom"' 
      .split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/).reject {|s| s.empty?} 

打印

=> ["hello", "\"my name\"", "is", "\"Tom\""] 
+0

+1。好答案! – Swanand

+0

偉大的正則表達式的解剖。很有幫助。 –

+0

這不會刪除特殊字符。 – 2011-11-17 05:49:03

4
text = ' hello "my name" is "Tom"' 

text.scan(/\s*("([^"]+)"|\w+)\s*/).each {|match| puts match[1] || match[0]} 

產地:

hello 
my name 
is 
Tom 

說明:隨後在雙引號內

一些詞或

單個字

0或多個空格

隨後通過0或更多空格

+0

OP的要求是什麼,不可能沒有前瞻。 – Swanand

+1

不知道你爲什麼會這麼想... –

+0

我原來的意思是它的原始解決方案,只用正則表達式來分割。任何經過處理的東西都不是我想到的。 – Swanand

1

你可以試試這個正則表達式:

/\b(\w+)\b/ 

它使用\b查找的單詞邊界。而這個網站http://rubular.com/是有幫助的。

+3

這不起作用,它不會嘗試在引號之間捕獲單個匹配 –