爲什麼strcmp比我的函數快得多？

我寫了一個函數，Str::Compare，那基本上是用另一種方式重寫的strcmp。雖然比較這兩個函數，在一個循環中重複500'000'000次，執行速度太快，大約快了x750倍。爲什麼strcmp比我的函數快得多？

此代碼被編譯成C庫-Os參數活性：

int Str::Compare(char* String_1, char* String_2) 
{ 
    char TempChar_1, TempChar_2; 

    do 
    { 
     TempChar_1 = *String_1++; 
     TempChar_2 = *String_2++; 
    } while(TempChar_1 && TempChar_1 == TempChar_2); 

    return TempChar_1 - TempChar_2; 
}

該功能的執行時間是3.058s，而僅strcmp0.004s。

爲什麼會發生這種情況？

而且我這是怎麼實現的基準循環：

int main() 
{ 
    char Xx[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"}, 
      Yy[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"}; 
    for(int i = 0; i < 500000000; ++i) 
     Str::Compare(Xx, Yy); 
}

編輯：在測試一些代碼，我寫了優化改進是顯着Str::Compare速度。如果之前strcmp是x750倍現在只有x250。這是新代碼：

int Str::Compare(char* String_1, char* String_2) 
{ 
    char TempChar_1, TempChar_2, TempChar_3; 

    while(TempChar_1 && !TempChar_3) 
    { 
      TempChar_1 = *String_1++; 
      TempChar_2 = *String_2++; 
      TempChar_3 = TempChar_1^TempChar_2; 
    } 

    return TempChar_1 - TempChar_2; 
}

新的執行時間是0.994s。

來源

2013-12-22 EnryFan

爲什麼不只是'while（* str1 ++ == * str2 ++）;' – texasbruce

因爲那個代碼是這個函數最沒有效率的實現。 – EnryFan

'1'編輯後的版本'Str :: Compare'不好 - 'TempChar_3'被初始化之前使用，'2'由VS2010產生的原始函數的代碼幾乎和'strcmp'一樣快。 '3'很難通過簡單的實現來大規模優化功能（甚至是固有的！）。 –

我很好奇，並建立一個測試程序：

#include <string.h> 

compare(char* String_1, char* String_2) 
{ 
    char TempChar_1, 
     TempChar_2; 

    do 
    { 
     TempChar_1 = *String_1++; 
     TempChar_2 = *String_2++; 
    } while(TempChar_1 && TempChar_1 == TempChar_2); 

    return TempChar_1 - TempChar_2; 
} 


int main(){ 
    int i=strcmp("foo","bar"); 
    int j=compare("foo","bar"); 

    return i; 
}

我編譯它使用gcc 4.7.3導致以下彙編程序與gcc -S -Os test.c到彙編：

.file "test.c" 
    .text 
    .globl compare 
    .type compare, @function 
compare: 
.LFB24: 
    .cfi_startproc 
    xorl %edx, %edx 
.L2: 
    movsbl (%rdi,%rdx), %eax 
    movsbl (%rsi,%rdx), %ecx 
    incq %rdx 
    cmpb %cl, %al 
    jne .L4 
    testb %al, %al 
    jne .L2 
.L4: 
    subl %ecx, %eax 
    ret 
    .cfi_endproc 
.LFE24: 
    .size compare, .-compare 
    .section .rodata.str1.1,"aMS",@progbits,1 
.LC0: 
    .string "bar" 
.LC1: 
    .string "foo" 
    .section .text.startup,"ax",@progbits 
    .globl main 
    .type main, @function 
main: 
.LFB25: 
    .cfi_startproc 
    movl $.LC0, %esi 
    movl $.LC1, %edi 
    call compare 
    movl $1, %eax 
    ret 
    .cfi_endproc 
.LFE25: 
    .size main, .-main 
    .ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3" 
    .section .note.GNU-stack,"",@progbits

我不是這在x86彙編中很好，但據我看來，對strcmp的調用被刪除，並簡單地用常量表達式（movl $1, %eax）替換。因此，如果您爲測試使用常量表達式，gcc可能會將strcmp優化爲常量。

來源

2013-12-22 23:31:48 lwi

Gcc也可以用常量替換對strlen的調用 – James

[這裏是G ++ 4.8.1在Coliru中將所有東西都縮小爲常量]（http://coliru.stacked-crooked.com/a/8bec7ba91e5fce00） – Casey

'strcmp'被替換爲即使在我的程序中，我也沒有使用常量。爲了更好地理解爲什麼，我會努力工作。 – EnryFan

當比較性能時，我發現最好將測試函數和測試驅動程序放在不同的編譯單元中。把你的測試函數放在不同的編譯單元中，然後把它們編譯成你想要的任何優化級別，但是編譯測試驅動程序時未經優化。否則，你會遇到你在這裏看到的那種問題。

問題是strcmp比較了兩個const C風格的字符串。如果通過strcmp(string_a, string_b)循環500,000,000次，優化編譯器將足夠聰明以減少該循環以優化該循環，然後可能足夠智能以優化剩餘的一個呼叫到strcmp。

您的比較函數需要兩個非常量字符串。就編譯器而言，你的函數可能有副作用。編譯器不知道，所以它不能優化循環。它必須生成代碼來執行比較500,000,000次。

來源

2013-12-22 23:38:28

指針參數的const限定通常不會影響編譯器推斷副作用的能力。只要原始數組對象是非常量，就可以丟棄常量。此外，該函數可能會調用其他引起字符串修改或別名的其他內容。 – Potatoswatter

我相信大多數標準庫都是用匯編語言編寫的。這可能是你看到標準庫比你的更快的原因。

來源

2016-10-21 16:12:57

爲什麼strcmp比我的函數快得多？

回答

相關問題