試圖UTF-8編碼Files.write（..）但得到一個OutOfMemoryError

我試圖用UTF-8編碼我的文本文件。當我這樣做，它的工作。試圖UTF-8編碼Files.write（..）但得到一個OutOfMemoryError

protected void writeFile(Path dir, StringBuilder sb) { 
    try { 
     String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex"; 
     Path path = Paths.get(fileName); 
     Files.write(path, sb.toString().getBytes(), StandardOpenOption.CREATE); 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 
}

但是，當我使用的編碼UTF8或UTF8比我收到java.lang.OutOfMemoryError：Java堆空間。爲什麼是這樣，我該如何解決這個問題？（我的內存設置已經是2GB）

來源

2014-09-24 Gynnad

那麼多大你的'StringBuilder'？ – 2014-09-24 14:34:59

看的getBytes的implemetation，我發現

byte[] encode(char[] ca, int off, int len) { 
     int en = scale(len, ce.maxBytesPerChar()); 
     byte[] ba = new byte[en];

是int en = scale(len, ce.maxBytesPerChar());請求大約4倍字節串的大小。

做調試代碼，並準確地找到時，它得到了OutOfMemory

來源

2014-09-25 11:02:21

謝謝，你是對的。 OutOfMemoryException在讀取文件時。（有人首先讀取文件，而不是附加一些字符串並將其保存爲新文件，現在我只是將字符串附加到文件中，因此我跳過了讀取部分）。 – Gynnad 2014-09-26 11:57:44

UTF-8將對多個Unicode字符使用多個字節。您之前的代碼使用默認編碼，通常在Windows上是有限的單字節編碼。

您可以試試：

sb.trimToSize();

隨着StringBuilder的追加上，總是增添了幾分額外的空間，這可能會在你的情況有所幫助。

以下內容可能會有相同的內存不足問題。它規避了toString()，所以你可能會先嚐試。

 Files.write(path, Collections.singletonList(sb), StandardCharsets.UTF_8);

的最後一次嘗試是分裂SB：

 int length = sb.length(); 
     final int CHUNK_SIZE = 1000; 
     int chunks = length/CHUNK_SIZE; 
     int size = (length + CHUNK_SIZE - 1)/CHUNK_SIZE; 
     List<CharSequence> chseqs = new ArrayList<>(size); 
     int n = 1; 
     for (int i = 0; i < length; i += n) { 
      n = Math.min(CHUNK_SIZE, length - i); 
      if (n == CHUNK_SIZE) { 
       // Check that the last char is not the first of a surrogate pair. 
       char ch = Character.charAt(chseqs, i + n - 1); 
       if (Character.isHighSurrogate()) { // Leading of pair 
        --n; 
       } 
      } 
      CharSequence chseq = sb.subSequence(i, i + n); 
      chseqs.add(chseq); 
     } 
     Files.write(path, chseqs, StandardCharsets.UTF_8);

最後一個要說的，因爲可能大部分會認爲這樣的：儘量不要使用StringBuilder這樣的大型文本。一些作家，或者異性戀的東西，一個管道。

來源

2014-09-24 16:10:36

Java字符串使用UTF-16，因此如果使用分塊方法，則必須考慮UTF-16替代項。一些Unicode代碼點使用代理，有些代碼不使用，因此您不希望給定代理對跨越塊邊界，其中高代理位於一個塊中，低代理位於下一個塊中。如果發生這種情況，代理將被分別編碼爲UTF-8，從而破壞該Unicode代碼點。 – 2014-09-25 01:23:54

@RemyLebeau感謝您的出色解釋。誠然，我很懶，而代碼的改變（如你所見）實際上是微不足道的。 – 2014-09-25 09:41:45

使用正確的工具進行工作。如果要寫個字符，請不要用寫法字節。

要寫出StringBuilder sb的內容爲Path path，使用

Files.write(path, Collections.singleton(sb), StandardCharsets.UTF_8);

底層的實現應該處理的字符分割成字節轉換。

如果沒有，或者如果你不能與事實方法在文件末尾追加新行生活，你可能需要下面的代碼片段：

final int chunkSize=8000; 
try(Writer w=Files.newBufferedWriter(path)) { 
    for(int s=0, e; s<sb.length(); s=e) { 
     e=Math.min(s+chunkSize, sb.length()); 
     w.append(sb.subSequence(s, e)); 
    } 
}

注意Files.newBufferedWriter默認爲UTF-8，並且此替代方法不會在塊之間插入換行符。

來源

2014-09-25 10:26:16 Holger

試圖UTF-8編碼Files.write（..）但得到一個OutOfMemoryError

回答

相關問題