將長字符串分解成適當的單詞換行

String original = "This is a sentence.Rajesh want to test the application for the word split."; 
List matchList = new ArrayList(); 
Pattern regex = Pattern.compile(".{1,10}(?:\\s|$)", Pattern.DOTALL); 
Matcher regexMatcher = regex.matcher(original); 
while (regexMatcher.find()) { 
    matchList.add(regexMatcher.group()); 
} 
System.out.println("Match List "+matchList);

我需要將文本解析爲長度不超過10個字符的行數組，並且不應該在行尾有單詞中斷。將長字符串分解成適當的單詞換行

我用下面的邏輯在我的情況卻是後10個字符解析到最近的空白如果在對如線

的下場休息的問題：實際的一句話就是「這是一個句子。Rajesh想要測試分詞這個詞的應用。「但是在邏輯執行完成之後，它變得如下。

匹配列表[這是一個，nce.Rajesh，要，試，pplication，對，字，分]

來源

2012-05-22 Raja

假設你在Groovy想要這個？除了標籤之外，您沒有提及Groovy ... –

您的意思是第10個字符不應該是？如果它是一個空間呢？ – JHS

如果單詞本身長度超過10個字符，會發生什麼情況？它應該分裂在中間嗎？例如，「quickbrownfoxjumpsoverthelazydog」變成「{」quickbrown「，」foxjumpsov「，」erthelazyd「，」og「}'？ – dasblinkenlight

我避免正則表達式原樣不拉的重量。這個代碼字包裝，如果一個單詞超過10個字符，就打破它。它還處理多餘的空白。

import static java.lang.Character.isWhitespace; 

public static void main(String[] args) { 
    final String original = 
    "This is a sentence.Rajesh want to test the application for the word split."; 
    final StringBuilder b = new StringBuilder(original.trim()); 
    final List<String> matchList = new ArrayList<String>(); 
    while (true) { 
    b.delete(0, indexOfFirstNonWsChar(b)); 
    if (b.length() == 0) break; 
    final int splitAt = lastIndexOfWsBeforeIndex(b, 10); 
    matchList.add(b.substring(0, splitAt).trim()); 
    b.delete(0, splitAt); 
    } 
    System.out.println("Match List "+matchList); 
} 
static int lastIndexOfWsBeforeIndex(CharSequence s, int i) { 
    if (s.length() <= i) return s.length(); 
    for (int j = i; j > 0; j--) if (isWhitespace(s.charAt(j-1))) return j; 
    return i; 
} 
static int indexOfFirstNonWsChar(CharSequence s) { 
    for (int i = 0; i < s.length(); i++) if (!isWhitespace(s.charAt(i))) return i; 
    return s.length(); 
}

打印：

Match List [This is a, sentence.R, ajesh, want to, test the, applicatio, n for the, word, split.]

來源

2012-05-22 13:13:18

我的要求是我需要限制1行中的字符數小於或等於100個字符，如果在100個字符末尾的單詞被破壞，我們需要將這個單詞添加到下一行 – Raja

這個問題在某些點標記爲Groovy的。假設一個Groovy的答案仍然是有效的，你不擔心保存多個空格（如「「）：

def splitIntoLines(text, maxLineSize) { 
    def words = text.split(/\s+/) 
    def lines = [''] 
    words.each { word -> 
     def lastLine = (lines[-1] + ' ' + word).trim() 
     if (lastLine.size() <= maxLineSize) 
      // Change last line. 
      lines[-1] = lastLine 
     else 
      // Add word as new line. 
      lines << word 
    } 
    lines 
} 

// Tests... 
def original = "This is a sentence. Rajesh want to test the application for the word split." 

assert splitIntoLines(original, 10) == [ 
    "This is a", 
    "sentence.", 
    "Rajesh", 
    "want to", 
    "test the", 
    "application", 
    "for the", 
    "word", 
    "split." 
] 
assert splitIntoLines(original, 20) == [ 
    "This is a sentence.", 
    "Rajesh want to test", 
    "the application for", 
    "the word split." 
] 
assert splitIntoLines(original, original.size()) == [original]

來源

2012-05-22 20:22:21 epidemian

好了，我已經成功地得到了以下的工作，用10行的最大長度限制，還要正確地分開長度超過10的單詞！

String original = "This is a sentence. Rajesh want to test the applications for the word split handling."; 
List matchList = new ArrayList(); 
Pattern regex = Pattern.compile("(.{1,10}(?:\\s|$))|(.{0,10})", Pattern.DOTALL); 
Matcher regexMatcher = regex.matcher(original); 
while (regexMatcher.find()) { 
    matchList.add(regexMatcher.group()); 
} 
System.out.println("Match List "+matchList);

這是結果：

This is a 
sentence. 
Rajesh want 
to test 
the 
applicatio 
ns word 
split 
handling.

來源

2013-05-30 01:40:46 Rafe

如果你想包含換行符，那麼：「（。{1,10}（？：\\ s \\ n | $））|（。{0,10}）」 – Rafe

這很適合使用正則表達式！但很難在破碎的詞語之間添加' - '... – Valen

對不起，我不明白？ – Rafe

將長字符串分解成適當的單詞換行

回答

相關問題