2016-11-23 63 views
0

我正在做一些單線程壓縮編解碼器的基準測試,我看到Zlib的性能似乎遠高於您對單線程的預期。我已經使用了org.apache.hadoop.io.compress.zlib.ZlibCompressor用於Zlib壓縮器實現,並且java.util.zip.Deflate用於與Gzip實現進行比較。Hadoop Zlib和JDK的比較性能Gzip

在Hadoop中以某種方式提供的ZLib壓縮器(包裝器)是多線程的,可能通過JNI接口?

的Zlib:

import org.apache.hadoop.io.compress.zlib.*; 
protected final zlibCompressor = new ZlibCompressor(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION, ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY, ZlibCompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE); 
protected final zlibDecompressor = new ZlibDecompressor(ZlibDecompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE); 

//compress 
zlibCompressor.setInput(uncompressed, 0, uncompressed.length); 
zlibCompressor.finish(); 
int n = zlibCompressor.compress(compressBuffer, 0, compressBuffer.length); 

//decompress 
zlibCompressor.reset(); 
zlibDecompressor.setInput(compressed, 0, compressed.length); 
int n = zlibDecompressor.decompress(uncompressBuffer, 0, uncompressBuffer.length); 

Gzip已:

import java.util.zip.*; 
protected final deflater = new Deflater(COMPRESSION_LEVEL, NO_WRAP); 
protected final inflater = new Inflater(NO_WRAP); 

//compress 
int n = compressBlockUsingStream(uncompressed, compressBuffer); 

//decompress 
inflater.reset(); 
int n = uncompressBlockUsingStream(new InflaterInputStream(new ByteArrayInputStream(compressed), _inflater), uncompressBuffer); 

gzip的輔助功能:甲

protected int compressBlockUsingStream(byte[] uncompressed, byte[] compressBuffer) throws IOException 
{ 
     ByteArrayOutputStream out = new ByteArrayOutputStream(compressBuffer); 
     compressToStream(uncompressed, out); 
     return out.length(); 
}  

protected int uncompressBlockUsingStream(InputStream in, byte[] uncompressBuffer) throws IOException 
{ 
      ByteArrayOutputStream out = new ByteArrayOutputStream(uncompressBuffer); 
      byte[] buffer = new byte[4096]; 
      int count; 
      while ((count = in.read(buffer)) >= 0) { 
       out.write(buffer, 0, count); 
      } 
      in.close(); 
      out.close(); 
      return out.length(); 
} 

吞吐量:

的Zlib /塊 - 143.902 MBps的

Gzip已/ JDK /流 - 22.573 Mbps的

任何人有一個想法,爲什麼zlib的是如此之快(本地使用所有內核)?該代碼預計將運行單線程。任何人都可以複製類似的結果?

回答

1

java.util.zip使用zlib。

你確定你在兩個版本中使用相同的壓縮級別嗎? COMPRESSION_LEVEL是否等於ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION

+0

OK,你知道什麼樣的價值「,‘DEFAULT_COMPRESSION’是[0 - 9]。喜歡明確設置它 – nikk

+0

不能打印 –

+0

好吧,如果你看一看Hadoop的zlibcompressor GitHub上的代碼? ,你會看到它被設置爲-15的整數值,-15是什麼意思呢?據我所知,壓縮級別從0到9範圍內。 – nikk