內聯代碼比Java中的函數調用/靜態函數更慢

我一直在運行一些測試以查看如何內聯函數代碼（在代碼本身中顯式編寫函數算法）會影響性能。我寫了一個簡單的字節數組到整數代碼，然後將其包裝在一個函數中，從另一個類靜態調用它，並從類本身靜態調用它。代碼如下：內聯代碼比Java中的函數調用/靜態函數更慢

public class FunctionCallSpeed { 
    public static final int numIter = 50000000; 

    public static void main (String [] args) { 
     byte [] n = new byte[4]; 

     long start; 

     System.out.println("Function from Static Class ================="); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      StaticClass.toInt(n); 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 

     System.out.println("Function from Class ========================"); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      toInt(n); 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 

     int actual = 0; 

     int len = n.length; 

     System.out.println("Inline Function ============================"); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      for (int j = 0; j < len; j++) { 
       actual += n[len - 1 - j] << 8 * j; 
      } 
     } 
     System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 
    } 

    public static int toInt(byte [] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 
}

結果如下：

Function from Static Class ================= 
Elapsed time: 0.096559931s 
Function from Class ======================== 
Elapsed time: 0.015741711s 
Inline Function ============================ 
Elapsed time: 0.837626286s

是否存在與字節碼事情很奇怪？我已經自己查看了字節碼，但我並不是很熟悉，我無法做出正面或反面的判斷。

編輯

我加assert語句讀取輸出，然後隨機字節讀取和基準現在的行爲我以爲它會的方式。感謝Tomasz Nurkiewicz，他向我指出了微基準文章。生成的代碼是這樣的：

public class FunctionCallSpeed { 
public static final int numIter = 50000000; 

public static void main (String [] args) { 
    byte [] n; 

    long start, end; 
    int checker, calc; 

    end = 0; 
    System.out.println("Function from Object ================="); 
    for (int i = 0; i < numIter; i++) { 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     calc = StaticClass.toInt(n); 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)end/1000000000 + "s"); 
    end = 0; 
    System.out.println("Function from Class =================="); 
    start = System.nanoTime(); 
    for (int i = 0; i < numIter; i++) { 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     calc = toInt(n); 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)end/1000000000 + "s"); 


    int len = 4; 
    end = 0; 
    System.out.println("Inline Function ======================"); 
    start = System.nanoTime(); 
    for (int i = 0; i < numIter; i++) { 
     calc = 0; 
     checker = (int)(Math.random() * 65535); 
     n = toByte(checker); 
     start = System.nanoTime(); 
     for (int j = 0; j < len; j++) { 
      calc += n[len - 1 - j] << 8 * j; 
     } 
     end += System.nanoTime() - start; 
     assert calc == checker; 
    } 
    System.out.println("Elapsed time: " + (double)(System.nanoTime() - start)/1000000000 + "s"); 
} 

public static byte [] toByte(int val) { 
    byte [] n = new byte[4]; 

    for (int i = 0; i < 4; i++) { 
     n[i] = (byte)((val >> 8 * i) & 0xFF); 
    } 
    return n; 
} 

public static int toInt(byte [] num) { 
    int actual = 0; 

    int len = num.length; 

    for (int i = 0; i < len; i++) { 
     actual += num[len - 1 - i] << 8 * i; 
    } 

    return actual; 
} 
}

結果：

Function from Static Class ================= 
Elapsed time: 9.276437031s 
Function from Class ======================== 
Elapsed time: 9.225660708s 
Inline Function ============================ 
Elapsed time: 5.9512E-5s

來源

2012-08-30 ddukki

[我怎樣寫一個正確的微-benchmark in Java？]（http://stackoverflow.com/questions/504103） –

@TomaszNurkiewicz，感謝您的鏈接。我想我固定了我的基準，至少對於我想檢查的情況。 – ddukki

我移植你的測試用例來caliper：

import com.google.caliper.SimpleBenchmark; 

public class ToInt extends SimpleBenchmark { 

    private byte[] n; 
    private int total; 

    @Override 
    protected void setUp() throws Exception { 
     n = new byte[4]; 
    } 

    public int timeStaticClass(int reps) { 
     for (int i = 0; i < reps; i++) { 
      total += StaticClass.toInt(n); 
     } 
     return total; 
    } 

    public int timeFromClass(int reps) { 
     for (int i = 0; i < reps; i++) { 
      total += toInt(n); 
     } 
     return total; 
    } 

    public int timeInline(int reps) { 
     for (int i = 0; i < reps; i++) { 
      int actual = 0; 
      int len = n.length; 
      for (int i1 = 0; i1 < len; i1++) { 
       actual += n[len - 1 - i1] << 8 * i1; 
      } 
      total += actual; 
     } 
     return total; 
    } 

    public static int toInt(byte[] num) { 
     int actual = 0; 
     int len = num.length; 
     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 
     return actual; 
    } 
} 

class StaticClass { 
    public static int toInt(byte[] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 

}

而且確實好像內嵌版本是最慢的，而兩個靜態版本幾乎相同（如預期）：

caliper

原因很難想象。我能想到的兩個因素：

JVM是在執行微優化時代碼塊小而簡單的推理儘可能好。當函數內聯時，整個代碼變得更加複雜，JVM放棄。對於較小的toInt()功能它JIT是更聰明
緩存位置 - 不知何故JVM執行與兩個代碼小塊（循環和方法），而不是一個更大更好的

來源

2012-08-30 17:21:06

它總是很難做出什麼樣的JIT是做了保證，但如果要我猜，它發現的返回值功能從來沒有被使用，並優化了很多。

如果你實際使用函數的返回值，我敢打賭它會改變速度。

來源

2012-08-30 16:43:06 corsiKa

是的，它解決了這個問題，謝謝。 – ddukki

你有幾個問題，但主要的問題是你正在測試一個優化代碼的迭代。這肯定會給你帶來不一樣的結果。我建議運行測試2秒，忽略前10000次迭代。

如果沒有保留循環的結果，整個循環可以在一些隨機時間間隔後丟棄。

斷裂每個測試到一個單獨的方法

public class FunctionCallSpeed { 
    public static final int numIter = 50000000; 
    private static int dontOptimiseAway; 

    public static void main(String[] args) { 
     byte[] n = new byte[4]; 

     for (int i = 0; i < 10; i++) { 
      test1(n); 
      test2(n); 
      test3(n); 
      System.out.println(); 
     } 
    } 

    private static void test1(byte[] n) { 
     System.out.print("from Static Class: "); 
     long start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      dontOptimiseAway = FunctionCallSpeed.toInt(n); 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    private static void test2(byte[] n) { 
     long start; 
     System.out.print("from Class: "); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      dontOptimiseAway = toInt(n); 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    private static void test3(byte[] n) { 
     long start; 
     int actual = 0; 

     int len = n.length; 

     System.out.print("Inlined: "); 
     start = System.nanoTime(); 
     for (int i = 0; i < numIter; i++) { 
      for (int j = 0; j < len; j++) { 
       actual += n[len - 1 - j] << 8 * j; 
      } 
      dontOptimiseAway = actual; 
     } 
     System.out.print((System.nanoTime() - start)/numIter + "ns "); 
    } 

    public static int toInt(byte[] num) { 
     int actual = 0; 

     int len = num.length; 

     for (int i = 0; i < len; i++) { 
      actual += num[len - 1 - i] << 8 * i; 
     } 

     return actual; 
    } 
}

打印

from Class: 7ns Inlined: 11ns from Static Class: 9ns 
from Class: 6ns Inlined: 8ns from Static Class: 8ns 
from Class: 6ns Inlined: 9ns from Static Class: 6ns

這表明，當內環被分別優化它是稍微更有效。

但是，如果我使用的字節優化轉換爲int

public static int toInt(byte[] num) { 
    return num[0] + (num[1] << 8) + (num[2] << 16) + (num[3] << 24); 
}

所有的測試報告

from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns 
from Static Class: 0ns from Class: 0ns Inlined: 0ns

爲實現測試沒有做任何有用的。 ;）

來源

2012-08-30 16:43:33

是的，它被優化了。謝謝！ – ddukki

您的測試存在缺陷。第二個測試是已經運行的第一個測試的好處。您需要在自己的JVM調用中運行每個測試用例。

來源

2012-08-30 17:07:51

內聯代碼比Java中的函數調用/靜態函數更慢

回答

相關問題