2013-05-01 101 views
4

我花了相當多的時間嘗試優化文件哈希算法以消除每一次可能的性能下降。Java NIO和非NIO性能

見我以前的SO線程:

Get File Hash Performance/Optimization

FileChannel ByteBuffer and Hashing Files

Determining Appropriate Buffer Size

有人recommened多次使用Java NIO獲取本機的性能提高(通過保持緩衝區的系統而不是將它們帶入JVM)。但是,我的NIO代碼運行速度相當慢,基準(反覆散列相同的文件與每個算法,否定任何操作系統/驅動器「魔術」,可能會造成扭曲結果。

我現在有兩個方法可以做同樣的事情:

This one runs faster almost every time:

/** 
* Gets Hash of file. 
* 
* @param file String path + filename of file to get hash. 
* @param hashAlgo Hash algorithm to use. <br/> 
*  Supported algorithms are: <br/> 
*  MD2, MD5 <br/> 
*  SHA-1 <br/> 
*  SHA-256, SHA-384, SHA-512 
* @param BUFFER Buffer size in bytes. Recommended to stay in<br/> 
*   multiples of 2 such as 1024, 2048, <br/> 
*   4096, 8192, 16384, 32768, 65536, etc. 
* @return String value of hash. (Variable length dependent on hash algorithm used) 
* @throws IOException If file is invalid. 
* @throws HashTypeException If no supported or valid hash algorithm was found. 
*/ 
public String getHash(String file, String hashAlgo, int BUFFER) throws IOException, HasherException { 
    StringBuffer hexString = null; 
    try { 
     MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo)); 
     FileInputStream fis = new FileInputStream(file); 

     byte[] dataBytes = new byte[BUFFER]; 

     int nread = 0; 
     while ((nread = fis.read(dataBytes)) != -1) { 
      md.update(dataBytes, 0, nread); 
     } 
     fis.close(); 
     byte[] mdbytes = md.digest(); 

     hexString = new StringBuffer(); 
     for (int i = 0; i < mdbytes.length; i++) { 
      hexString.append(Integer.toHexString((0xFF & mdbytes[i]))); 
     } 

     return hexString.toString(); 

    } catch (NoSuchAlgorithmException | HasherException e) { 
     throw new HasherException("Unsuppored Hash Algorithm.", e); 
    } 
} 

My Java NIO method that runs considerably slower most of the time:

/** 
* Gets Hash of file using java.nio File Channels and ByteBuffer 
* <br/>for native system calls where possible. This may improve <br/> 
* performance in some circumstances. 
* 
* @param fileStr String path + filename of file to get hash. 
* @param hashAlgo Hash algorithm to use. <br/> 
*  Supported algorithms are: <br/> 
*  MD2, MD5 <br/> 
*  SHA-1 <br/> 
*  SHA-256, SHA-384, SHA-512 
* @param BUFFER Buffer size in bytes. Recommended to stay in<br/> 
*   multiples of 2 such as 1024, 2048, <br/> 
*   4096, 8192, 16384, 32768, 65536, etc. 
* @return String value of hash. (Variable length dependent on hash algorithm used) 
* @throws IOException If file is invalid. 
* @throws HashTypeException If no supported or valid hash algorithm was found. 
*/ 
public String getHashNIO(String fileStr, String hashAlgo, int BUFFER) throws IOException, HasherException { 

    File file = new File(fileStr); 

    MessageDigest md = null; 
    FileInputStream fis = null; 
    FileChannel fc = null; 
    ByteBuffer bbf = null; 
    StringBuilder hexString = null; 

    try { 
     md = MessageDigest.getInstance(hashAlgo); 
     fis = new FileInputStream(file); 
     fc = fis.getChannel(); 
     bbf = ByteBuffer.allocateDirect(BUFFER); // allocation in bytes - 1024, 2048, 4096, 8192 

     int b; 

     b = fc.read(bbf); 

     while ((b != -1) && (b != 0)) { 
      bbf.flip(); 

      byte[] bytes = new byte[b]; 
      bbf.get(bytes); 

      md.update(bytes, 0, b); 

      bbf.clear(); 
      b = fc.read(bbf); 
     } 

     fis.close(); 

     byte[] mdbytes = md.digest(); 

     hexString = new StringBuilder(); 

     for (int i = 0; i < mdbytes.length; i++) { 
      hexString.append(Integer.toHexString((0xFF & mdbytes[i]))); 
     } 

     return hexString.toString(); 

    } catch (NoSuchAlgorithmException e) { 
     throw new HasherException("Unsupported Hash Algorithm.", e); 
    } 
} 

我的想法是,Java NIO嘗試使用本地系統調用等來保持系統中的處理和存儲(緩衝區)並且不在JVM之中 - 這可以防止(理論上)程序不得不在JVM和系統之間來回拖動事物。從理論上講,這應該會更快......但是也許我的MessageDigest強制JVM帶入緩衝區,否定本地緩衝區/系統調用帶來的任何性能改進?我在這個邏輯中糾正了,還是我失去了方向?

Please help me understand why Java NIO is not better in this scenario.

+1

NIO擅長併發。如果你沒有併發執行,它只會在代碼和處理上花費很多。我不知道什麼是性能差異,但可能是這樣。 – akostadinov 2013-05-01 15:39:41

+2

從通道讀入「ByteBuffer」,然後再拷貝到byte []中可能會損害nio方法的性能。如果JVM可以避免將系統空間(操作系統,磁盤,...)中的數據傳輸到用戶空間,那麼NIO魔術(除了非阻塞部分之外)主要進行操作。由於哈希算法需要讀取文件的每個字節,顯然沒有可用的快捷方式。如果Old-IO方法的性能不足,請考慮使用分析器或測試庫實現(例如[guava](http://code.google.com/p/guava-libraries/wiki/HashingExplained)以獲得更好的性能) – Pyranja 2013-05-01 15:58:16

+0

@Pranranja嗯,有趣的信息。我檢出了Guava庫,不幸的是我不預期它會產生比我上面的方法更高的性能提升,因爲它們都依賴於默認的'java.security.MessageDigest'實現來實際執行哈希...並且如果文件是太大而無法在緩衝區中合理地修復(例如,10GB文件),那麼它必須通過緩衝區進行流式處理,並且我們需要做很多I/O操作才能通過緩衝區進行流式處理,直到我們已經散列所有的位......嗯.. – SnakeDoc 2013-05-01 16:29:46

回答

6

兩兩件事很可能會令您的NIO的做法更好:

  1. 嘗試使用memory-mapped file,而不是將數據讀入堆內存。
  2. 將數據傳遞給摘要using a ByteBuffer而不是byte[]數組。

第一個應該避免在文件緩存和應用程序堆之間複製數據,而第二個應該避免在緩衝區和字節數組之間複製數據。如果沒有這些優化,您可能會複製一些天真的非NIO方法。