2017-05-31 71 views
8

在試驗ZLib壓縮時,我遇到了一個奇怪的問題。如果源數組長度至少爲32752字節,則用隨機數據解壓縮zlib壓縮的字節數組將失敗。這裏有一個重現問題的小程序,你可以see it in action on IDEOne。壓縮和解壓縮方法是摘取教程的標準代碼。在大字節陣列上ZLib解壓縮失敗

public class ZlibMain { 

    private static byte[] compress(final byte[] data) { 
     final Deflater deflater = new Deflater(); 
     deflater.setInput(data); 

     deflater.finish(); 
     final byte[] bytesCompressed = new byte[Short.MAX_VALUE]; 
     final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed); 
     final byte[] returnValues = new byte[numberOfBytesAfterCompression]; 
     System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression); 
     return returnValues; 

    } 

    private static byte[] decompress(final byte[] data) { 
     final Inflater inflater = new Inflater(); 
     inflater.setInput(data); 
     try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) { 
      final byte[] buffer = new byte[Math.max(1024, data.length/10)]; 
      while (!inflater.finished()) { 
       final int count = inflater.inflate(buffer); 
       outputStream.write(buffer, 0, count); 
      } 
      outputStream.close(); 
      final byte[] output = outputStream.toByteArray(); 
      return output; 
     } catch (DataFormatException | IOException e) { 
      throw new RuntimeException(e); 
     } 
    } 

    public static void main(final String[] args) { 
     roundTrip(100); 
     roundTrip(1000); 
     roundTrip(10000); 
     roundTrip(20000); 
     roundTrip(30000); 
     roundTrip(32000); 
     for (int i = 32700; i < 33000; i++) { 
      if(!roundTrip(i))break; 
     } 
    } 

    private static boolean roundTrip(final int i) { 
     System.out.printf("Starting round trip with size %d: ", i); 
     final byte[] data = new byte[i]; 
     for (int j = 0; j < data.length; j++) { 
      data[j]= (byte) j; 
     } 
     shuffleArray(data); 

     final byte[] compressed = compress(data); 
     try { 
      final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed)) 
                 .get(2, TimeUnit.SECONDS); 
      System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching"); 
      return true; 
     } catch (InterruptedException | ExecutionException | TimeoutException e) { 
      System.out.println("Failure!"); 
      return false; 
     } 
    } 

    // Implementing Fisher–Yates shuffle 
    // source: https://stackoverflow.com/a/1520212/342852 
    static void shuffleArray(byte[] ar) { 
     Random rnd = ThreadLocalRandom.current(); 
     for (int i = ar.length - 1; i > 0; i--) { 
      int index = rnd.nextInt(i + 1); 
      // Simple swap 
      byte a = ar[index]; 
      ar[index] = ar[i]; 
      ar[i] = a; 
     } 
    } 
} 

這是在zlib的一個已知的bug?或者我的壓縮/解壓縮例程有錯誤?

回答

4

它是在壓縮的邏輯錯誤/解壓縮方法。我不是這個深的實施方案,而是與調試,我發現了以下內容:

當32752個字節的緩衝區被壓縮時,deflater.deflate()方法返回的32767的值,這是你在初始化的緩衝區大小行:

final byte[] bytesCompressed = new byte[Short.MAX_VALUE]; 

如果增加例如緩衝區大小,以

final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE]; 

的,你會看到,的32752個字節輸入實際上被放氣到32768個字節。因此,在您的代碼中,壓縮數據不包含應該在其中的所有數據。

當您嘗試解壓時,inflater.inflate()方法返回零,表示需要更多輸入數據。但是,你只能檢查inflater.finished(),你會以無盡的循環結束。

因此,您可以增加壓縮時的緩衝區大小,但這可能意味着要解決更大文件的問題,或者您最好重寫壓縮/解壓縮邏輯以按塊處理數據。

+0

謝謝。正如所寫的,它不是我的代碼,我現在用代碼替換它。但是,要感謝關於代碼有什麼問題的啓發。 –

+0

是個不錯的問題;我喜歡狩獵這樣的錯誤;-) –

+0

非常好的調查! – nobeh

4

顯然,compress()方法是錯誤的。 這一個工作的:

public static byte[] compress(final byte[] data) { 
    try (final ByteArrayOutputStream outputStream = 
            new ByteArrayOutputStream(data.length);) { 

     final Deflater deflater = new Deflater(); 
     deflater.setInput(data); 
     deflater.finish(); 
     final byte[] buffer = new byte[1024]; 
     while (!deflater.finished()) { 
      final int count = deflater.deflate(buffer); 
      outputStream.write(buffer, 0, count); 
     } 

     final byte[] output = outputStream.toByteArray(); 
     return output; 
    } catch (IOException e) { 
     throw new IllegalStateException(e); 
    } 
} 
+2

您還需要檢查inflater.inflate()返回0 –