Ruby正則表達式提取字

我目前正在努力想出一個正則表達式，它可以將字符串拆分爲單詞，其中單詞被定義爲由空格包圍的字符序列，或者包含在雙引號之間。我使用String#scanRuby正則表達式提取字

例如，字符串：使用

hello 
my name 
is 
Tom

我設法匹配雙引號括起來的話：

' hello "my name" is "Tom"'

應匹配的話

/"([^\"]*)"/

但我不知道如何將包圍空白字符得到'你好'，'是'和'湯姆'，同時不要搞亂'我的名字'。

任何幫助，將不勝感激！

來源

2011-11-17 Shabu

result = ' hello "my name" is "Tom"'.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/)

將爲您工作。它會打印

=> ["", "hello", "\"my name\"", "is", "\"Tom\""]

只是忽略空字符串。

說明

" 
\\s   # Match a single character that is a 「whitespace character」 (spaces, tabs, and line breaks) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(?=   # Assert that the regex below can be matched, starting at this position (positive lookahead) 
    (?:   # Match the regular expression below 
     [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
     \"    # Match the character 「\"」 literally 
     [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
     \"    # Match the character 「\"」 literally 
    )*   # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
    [^\"]   # Match any character that is NOT a 「\"」 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
    \$    # Assert position at the end of a line (at the end of the string or before a line break character) 
) 
"

您可以使用reject這樣避免空字符串

result = ' hello "my name" is "Tom"' 
      .split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/).reject {|s| s.empty?}

打印

=> ["hello", "\"my name\"", "is", "\"Tom\""]

來源

2011-11-17 05:27:58

+1。好答案！ – Swanand

偉大的正則表達式的解剖。很有幫助。 –

這不會刪除特殊字符。 – 2011-11-17 05:49:03

text = ' hello "my name" is "Tom"' 

text.scan(/\s*("([^"]+)"|\w+)\s*/).each {|match| puts match[1] || match[0]}

產地：

hello 
my name 
is 
Tom

說明：隨後在雙引號內

任

一些詞或

單個字

0或多個空格

隨後通過0或更多空格

來源

2011-11-17 05:36:49

OP的要求是什麼，不可能沒有前瞻。 – Swanand

不知道你爲什麼會這麼想... –

我原來的意思是它的原始解決方案，只用正則表達式來分割。任何經過處理的東西都不是我想到的。 – Swanand

你可以試試這個正則表達式：

/\b(\w+)\b/

它使用\b查找的單詞邊界。而這個網站http://rubular.com/是有幫助的。

來源

2012-07-30 13:44:45 demoslam

這不起作用，它不會嘗試在引號之間捕獲單個匹配 –

Ruby正則表達式提取字

回答

相關問題