Java：分兩部分讀取文件 - 部分爲字符串，部分爲字節[]

我有一個文件，它被兩部分分割爲「\ n \ n」 - 第一部分不是太長字符串，第二部分是字節數組，這可能會很長。Java：分兩部分讀取文件 - 部分爲字符串，部分爲字節[]

我試圖讀取該文件，如下所示：儘管

byte[] result; 
    try (final FileInputStream fis = new FileInputStream(file)) { 

     final InputStreamReader isr = new InputStreamReader(fis); 
     final BufferedReader reader = new BufferedReader(isr); 

     String line; 
     // reading until \n\n 
     while (!(line = reader.readLine()).trim().isEmpty()){ 
      // processing the line 
     } 

     // copying the rest of the byte array 
     result = IOUtils.toByteArray(reader); 
     reader.close(); 
    }

結果數組都是應該的大小，它的內容被破壞。如果我試圖直接在fis或isr上使用toByteArray，則結果內容爲空。

如何正確高效地讀取文件的其餘部分？

謝謝！

來源

2013-02-27 Vojtěch

感謝所有的意見 - 最終實現在這個工作方式：

try (final FileInputStream fis = new FileInputStream(file)) { 

     ByteBuffer buffer = ByteBuffer.allocate(64); 

     boolean wasLast = false; 
     String headerValue = null, headerKey = null; 
     byte[] result = null; 

     while (true) { 
      byte current = (byte) fis.read(); 
      if (current == '\n') { 
       if (wasLast) { 
        // this is \n\n 
        break; 
       } else { 
        // just a new line in header 
        wasLast = true; 
        headerValue = new String(buffer.array(), 0, buffer.position())); 
        buffer.clear(); 
       } 
      } else if (current == '\t') { 
       // headerKey\theaderValue\n 
       headerKey = new String(buffer.array(), 0, buffer.position()); 
       buffer.clear(); 
      } else { 
       buffer.put(current); 
       wasLast = false; 
      } 
     } 
     // reading the rest 
     result = IOUtils.toByteArray(fis); 
    }

來源

2013-02-27 07:04:04

如果你還在'if（current =='\ t'）'塊內部放置了'wasLast = false;'，以防萬一遇到一個空的鍵值對導致'... \ n \ t \ n ...'？ :) – 2013-03-01 17:10:36

內容被破壞的原因是因爲IOUtils.toByteArray(...)函數以默認字符編碼中的字符串形式讀取數據，即它使用默認編碼規定的任何邏輯將8位二進制值轉換爲文本字符。這通常會導致許多二進制值被破壞。

根據字符集究竟怎麼實現的，有輕微的機會，這可能工作：

result = IOUtils.toByteArray(reader, "ISO-8859-1");

ISO-8859-1僅使用每個字符一個字節。並非所有的字符值都已定義，但許多實現都會通過它們。也許你很幸運。

但是一個更簡潔的解決方案是先讀取字符串作爲二進制數據，然後通過new String(bytes)將其轉換爲文本，而不是以字符串的形式讀取二進制數據，然後將其轉換回來。

雖然這可能意味着您需要實現您自己的BufferedReader版本以達到性能目的。

您可以通過明顯的谷歌搜索，這將（例如）帶領你在這裏標準的BufferedReader的源代碼：

http://www.docjar.com/html/api/java/io/BufferedReader.java.html

這是一個有點長，但概念不是太難理解，所以希望它可以作爲參考。

來源

2013-02-27 05:28:37

這是exaclty我發現自己幾分鐘前:-) – 2013-02-27 07:01:04

另外，您可以讀取文件到字節數組，找到\ n \ n位置和陣列分成行和字節

byte[] a = Files.readAllBytes(Paths.get("file")); 
    String line = ""; 
    byte[] result = a; 
    for (int i = 0; i < a.length - 1; i++) { 
     if (a[i] == '\n' && a[i + 1] == '\n') { 
      line = new String(a, 0, i); 
      int len = a.length - i - 1; 
      result = new byte[len]; 
      System.arraycopy(a, i + 1, result, 0, len); 
      break; 
     } 
    }

來源

2013-02-27 05:55:31

我認爲陣列副本會相當昂貴。 – 2013-02-27 07:04:37

Java：分兩部分讀取文件 - 部分爲字符串，部分爲字節[]

回答

相關問題