2015-12-26 181 views
4

我正在尋找一種方式數組的數組拆分此字符串數組:分割字符串數組轉換爲字符串

["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this", 
"text", "?", "Without", "any", "errors", "!"] 

到一個標點符號終止組:

[ 
    ["this", "is", "a", "test", "."], 
    ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"], 
    ["Without", "any", "errors", "!"] 
] 

是有一個簡單的方法來做到這一點?對於迭代數組,是否最有效的方法是將每個索引添加到臨時數組,並在找到標點時將該臨時數組追加到容器數組中?

我想使用slicemap,但我無法弄清楚它是否可行。

回答

2

@ndn給了最好的回答這個問題,但我會建議另一種方法可應用於其它問題。

這樣的數組通常是通過將字符串分割成空格或標點符號來獲得的。例如:

s = "this is a test. I wonder if I can parse this text? Without any errors!" 
s.scan /\w+|[.?!]/ 
    #=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", 
    # "parse", "this", "text", "?", "Without", "any", "errors", "!"] 

如果這是你可能會發現更方便地在一些其他的方式直接處理字符串的情況。這裏,例如,你可以先使用String#split用正則表達式來打破串s成句子:

r1 =/
    (?<=[.?!]) # match one of the given punctuation characters in capture group 1 
    \s* # match >= 0 whitespace characters to remove spaces 
    /x # extended/free-spacing regex definition mode 

a = s.split(r1) 
    #=> ["this is a test.", "I wonder if I can parse this text?", 
    # "Without any errors!"] 

,然後分手的句子:

r2 =/
    \s+  # match >= 1 whitespace characters 
    |   # or 
    (?=[.?!]) # use a positive lookahead to match a zero-width string 
       # followed by one of the punctuation characters 
    /x 

b = a.map { |s| s.split(r2) } 
    #=> [["this", "is", "a", "test", "."], 
    # ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"], 
    # ["Without", "any", "errors", "!"]] 
+0

不幸的是,這種解決方案似乎已經失去了輸出中的標點符號。 –

+0

謝謝,@Wand。我誤解了這個問題。我做了一個編輯。 –