未對齊數據的性能損失

作爲CS學生，我試圖理解計算機的基本知識。當我偶然發現this website時，我想自己測試這些表現處罰。我明白他在說什麼，以及爲什麼會發生/應該發生。未對齊數據的性能損失

總之，這裏是我的代碼，我習慣叫他寫的那些功能：

int main(void) 
{ 
    int i = 0; 
    uint8_t alignment = 0; 
    uint8_t size = 1024 * 1024 * 10; // 10MiB 
    uint8_t* block = malloc(size); 

    for(alignment = 0; alignment <= 17; alignment++) 
    { 
     start_t = clock(); 
     for(i = 0; i < 100000; i++) 
      Munge8(block + alignment, size); 

     end_t = clock(); 
     printf("%i\n", end_t - start_t); 
    } 
    // Repeat, but next time with Munge16, Munge32, Munge64 
}

我不知道如果我的CPU & RAM如此極快，但所有4個函數的輸出（Munge8 ，Munge16，Munge32和Munge64）總是3或4（隨機，無模式）。

這可能嗎？ 100000個重複應該是更多的工作要做，或者我是否錯了？我正在使用Windows 7 Enterprise x64，Intel Core i7-4600U CPU @ 2.10GHz。所有編譯器優化都關閉，即/ Od。

SO上的所有相關問題都沒有回答爲什麼我的解決方案不起作用。

我在做什麼錯？任何幫助是極大的讚賞。

編輯： 首先：非常感謝您的幫助。從uint8_t改變大小的類型uint32_t後，我改變了所有的內部循環造成的測試功能未定義行爲到兩條獨立的線路：

while(data32 != data32End) 
{ 
    data32++; 
    *data32 = -(*data32); 
}

現在我得到的25/26相對穩定的輸出，12/13,6和3滴答，計算100次重複的平均值。這是合乎邏輯的結果嗎？這是否意味着我的體系結構像對齊訪問一樣快速（或緩慢）處理未對齊的訪問？我是否不精確地測量時間？或者，除以10時是否存在準確度問題？我的新代碼：

int main(void) 
{ 
    int i = 0; 
    uint8_t alignment = 0; 
    uint64_t size = 1024 * 1024 * 10; // 10MiB 
    uint8_t* block = malloc(size); 


    printf("%i\n\n", CLOCKS_PER_SEC); // yields 1000, just for comparison how fast my machine 'ticks' 
    for(alignment = 0; alignment <= 17; alignment++) 
    { 
     start_t = clock(); 
     for(i = 0; i < 100; i++) 
      singleByte(block + alignment, size); 

     end_t = clock(); 
     printf("%i\n", (end_t - start_t)/100); 
    } 
    // Again, repeat with all different functions 
}

當然，一般的批評也是值得讚賞的。 :)

來源

2014-10-28 Ophidian

+11

'uint8_t size = 1024 * 1024 * 10; // 10MiB'：範圍uint8_t：0 - 255. – BLUEPIXY 2014-10-28 13:33:20

查看鏈接的文章，函數都包含未定義的行，例如'* data8 ++ = - * data8;'這樣的行。另外，你調用它們的方式將使它們訪問分配塊之後的內存。 – interjay 2014-10-28 13:40:02

@interjay爲什麼會'* data8 ++ = - * data8'是未定義的行爲？ – 2014-10-28 13:52:18

這種失敗，因爲整數溢出：

uint8_t size = 1024 * 1024 * 10; // 10MiB

它應該是：

const size_t size = 1024 * 1024 * 10; // 10MiB

，爲什麼你會永遠使用8位的數量沒有主意，舉得起那麼大。

調查如何爲您的編譯器啓用所有警告。

來源

2014-10-28 15:17:33 unwind

BLUEPIXY已經在我的問題的評論中提到了這一點。編輯完全是關於這個改變（發生在你接近回答的時候:)）。 – Ophidian 2014-10-29 11:09:56

看來你的時鐘功能有問題。對於對於處理器，CLOCKS_PER_SEC對於處理器來說太低，即使CPU節流被激活（如果頻率縮放關閉，您應該得到2100000左右）。通過使用cycle.h，您獲得每次平均測量的週期數是多少？

來源

2014-10-28 16:09:28 jyvet