插入排序比shell排序快得多

我正在閱讀有關Sedgewick「算法」中排序的章節。一路上，我寫了3個基本的排序算法：選擇，插入和shell排序。該書說，儘管所有三者都具有二次最壞情況的複雜性，但shell的排序應該比插入排序隨機數據快得多。在這本書中，他們獲得了600倍的性能提升。插入排序比shell排序快得多

，但我得到以下乘數我的筆記本電腦（幾乎不與數組大小的增加改變）：

選擇：5.5X
插入：1個
殼：1.8倍！

困擾我的問題是 - 爲什麼shell排序幾乎比插入排序慢兩倍？

我想，我的shellort實現有些問題。但我幾乎從書上抄了一遍：

class ShellSort extends Sort { 
    //precalculate sequence: 1, 4, 13, 40, 121, 364, 1093, ... 
    //(3^20 - 1)/2 is enough for any array I sort 
    private static final int[] SEQUENCE = new int[20]; 
    static { 
     for(int i = 1; i <= SEQUENCE.length; i++) 
      SEQUENCE[i - 1] = (int)(Math.pow(3, i) - 1)/2; 
    } 

    public void sort(int[] a) { 
     int length = a.length; 

     int seqLen = SEQUENCE.length; 
     int nth; 
     int j; 

     for(int seqi = seqLen - 1; seqi >= 0; seqi--) { 
      if(SEQUENCE[seqi] < length/3) { 
       nth = SEQUENCE[seqi]; 
       for(int n = 0; n < length; n+=nth) { 
        j = n; 
        while(j > 0 && a[j] < a[j - nth]) { 
         exch(a, j, j-nth); 
         j -= nth; 
        } 
       } 
      } 
     } 
    } 
}

這些代碼的休息，希望他們的機器上運行測試誰（與JVM熱起來加倍數組大小測試有對結果沒有顯著的影響，所以這個簡單的測試已經足夠N>〜200 000）。

主：

int N = 500_000; 
Random rand = new Random(); 
int[] a = new int[N]; 
for(int i = 0; i < N; i++) 
    a[i] = rand.nextInt(); 

//insertion sort 
int[] aCopy = Arrays.copyOf(a, a.length); 
long start = System.nanoTime(); 
new InsertionSort().sort(aCopy); 
System.out.println("insert:\t" + (System.nanoTime() - start)); 

//shell sort 
aCopy = Arrays.copyOf(a, a.length); 
start = System.nanoTime(); 
new ShellSort().sort(aCopy); 
System.out.println("shell:\t" + (System.nanoTime() - start));

插入排序和排序類：

class InsertionSort extends Sort { 
    public void sort(int[] a) { 
     int length = a.length; 
     int j; 
     int x; 
     for(int i = 1; i < length; i++) { 
      j = i; 
      x = a[i]; 
      while(j > 0 && x < a[j-1]) { 
       a[j] = a[--j]; 
      } 
      a[j] = x; 
     } 
    } 
} 
abstract class Sort { 
    abstract public void sort(int[] a); 

    protected static final void exch(int[] a, int i, int j) { 
     int t = a[i]; 
     a[i] = a[j]; 
     a[j] = t; 
    } 
}

來源

2014-02-27 Oroboros102

這將是一個好主意，遵循這裏的指導 - [如何在Java中編寫正確的微基準？]（http://stackoverflow.com/questions/504103/how-do-i-write- a-correct-micro-benchmark-in-java） - 在將太多的權重放入結果之前。 – Dukeling

在證明它們是** [具有統計顯着性]（http://en.wikipedia.org/wiki/Statistical_significance）**之前，千萬不要試圖解釋實證結果。 – amit

我在jvm「加熱」的基礎上運行了所有代碼，並獲得了相同的結果。原因是 - 內部循環很快得到了JIT，並且沒有引入重要的GC開銷。 – Oroboros102

由於最後一步是1，並且您的兩個內部循環執行基本插入排序時，您的實現已損壞並僅輸出已排序的數組步驟是1. 當步驟大於1時，實現中的兩個內部循環除了對數組執行任何操作之外，對數組進行分步排序，因此您實現的功能是在外循環的所有迭代中對數組進行混洗，然後插入 - 在外循環的最後一次迭代中排序。當然，這需要更長的時間，然後只需插入一次即可。

重用您的序列正確的外殼類型的實現應該是這樣的：

適當的初始指數爲兩個內部循環

正確：

public void sort(int[] a) { 
    int length = a.length; 

    int stepIndex = 0; 
    while (stepIndex < SEQUENCE.length - 1 && SEQUENCE[ stepIndex ] < length/3) { 
     stepIndex++; 
    } 

    while (stepIndex >= 0) { 
     int step = SEQUENCE[ stepIndex-- ]; 
     for (int i = step; i < length; i++) { // DIFF: i++ instead of i+=step 
      for (int j = i; j >= step && a[ j ] < a[ j - step ]; j -= step) { 
       exch(a, j, j - step); 
      } 
     } 
    } 
}

兩個主要的這個實施之間和你的區別中間循環的索引增量（+1代替代碼中的+步）

此外，請檢查http://algs4.cs.princeton.edu/21elementary/Shell.java.html以獲得良好的實施和良好的步驟順序。

來源

2014-02-27 08:15:24

哈哈，我開始計算緩存未命中，而不是理解算法。 – Oroboros102

從快速瀏覽，你可以看到，希爾排序看上去更慢由具有多個圈。蠻力，你可以把system.out.println放在最裏面的循環中，看看有多少次比較。爲希爾排序

的

3路（INT色漆= seqLen - 1;色漆> = 0; seqi--）
對（INT N = 0; N <長度; N + =第n個）
而（j> 0 & &一個[j]的一個< [J - 第n]）插入

2個循環

for（int i = 1;我的長度爲<;我++）
而（j> 0 & & X <一個[J-1]）

來源

2014-02-27 07:24:09 jcalloway

Shell排序有〜0.8插入排序循環。但仍然較慢，而不是600x（如在書中）。 – Oroboros102

自我修復：「仍然較慢，但不是600x更快」 – Oroboros102

@Oroboros102所以再看看你的代碼。我可能沒有時間構造函數調用。不知道靜態初始化器何時運行。也可以嘗試在插入排序之前對shellort進行計時。看看你是否仍然有相同的效果。 – jcalloway

相信原因將被緩存。 Shell排序有很多（有點）隨機訪問，所以更多的緩存未命中。我相信這會給新硬件帶來更糟糕的表現。插入排序幾乎總是在內存的相同區域工作，所以它的性能更好

來源

2014-02-27 07:29:57 taytay

你的假設看起來更像真理。我如何驗證它？ – Oroboros102

http://stackoverflow.com/questions/10082517/simplest-tool-to-measure-c-program-cache-hit-miss-and-cpu-time-in-linux 看起來像'perf stat ./a。出來'應該工作。但我不確定它是否會像Java應用程序一樣有效 – taytay

看起來緩存未命中不是問題。插入排序：161M參考和230K失誤，外殼分別爲167M和300K。 – Oroboros102

插入排序比shell排序快得多

回答

相關問題