我在.txt文件中有一本書,我試圖將書分成單個單詞。在這種情況下,一個詞被認爲是A-Z,a-z或'。正則表達式來拆分字符串,但也捕獲分隔符?
到目前爲止,我有這樣的:
String[] words = bookStr.split("[^a-zA-Z']+");
成功地分割的話了就好了。但是,我也想要捕獲所有的分隔符和它們發生的次數。這可能與模式有關,還是我實際上需要循環遍歷整個字符串來計算我需要的數據?
例子:
String bookStr = "I just can't figure this out.\nI wonder why LOST ended?"
String[] words = bookStr.split("[^a-zA-Z']+");
// Using the regex I already have, I have gathered the words I want.
// ["I", "just", "can't", "figure", "this", "out", "I", "wonder", "why", "LOST", "ended"]
// Is there any way to gather these as well using the Pattern class or with split()?
// [" ", " ", " ", " ", " ", ".", "\n", " ", " ", " ", " ", "?"]
你想你的數組中的空白*和*的話,或者你從字面上想空白的數組序列? – Bohemian
我想要一個既包含分隔符之間的內容又包含分隔符本身的數組。所以也許只有一個數組包含上面兩個數組的內容。 – user3032301