我一直在嘗試在某些條件下提高strcmp
的性能。但是,不幸的是,我甚至無法獲得普通香草strcmp
的執行以及庫執行。爲什麼這個版本的strcmp比較慢?
我看到similar question,但答案是不同的是編譯器優化了字符串文字的比較。我的測試不使用字符串文字。
這裏的實現(comparisons.cpp)
int strcmp_custom(const char* a, const char* b) {
while (*b == *a) {
if (*a == '\0') return 0;
a++;
b++;
}
return *b - *a;
}
而這裏的測試驅動程序(driver.cpp):
#include "comparisons.h"
#include <array>
#include <chrono>
#include <iostream>
void init_string(char* str, int nChars) {
// 10% of strings will be equal, and 90% of strings will have one char different.
// This way, many strings will share long prefixes so strcmp has to exercise a bit.
// Using random strings still shows the custom implementation as slower (just less so).
str[nChars - 1] = '\0';
for (int i = 0; i < nChars - 1; i++)
str[i] = (i % 94) + 32;
if (rand() % 10 != 0)
str[rand() % (nChars - 1)] = 'x';
}
int main(int argc, char** argv) {
srand(1234);
// Pre-generate some strings to compare.
const int kSampleSize = 100;
std::array<char[1024], kSampleSize> strings;
for (int i = 0; i < kSampleSize; i++)
init_string(strings[i], kSampleSize);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < kSampleSize; i++)
for (int j = 0; j < kSampleSize; j++)
strcmp(strings[i], strings[j]);
auto end = std::chrono::high_resolution_clock::now();
std::cout << "strcmp - " << (end - start).count() << std::endl;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < kSampleSize; i++)
for (int j = 0; j < kSampleSize; j++)
strcmp_custom(strings[i], strings[j]);
end = std::chrono::high_resolution_clock::now();
std::cout << "strcmp_custom - " << (end - start).count() << std::endl;
}
我的生成文件:
CC=clang++
test: driver.o comparisons.o
$(CC) -o test driver.o comparisons.o
# Compile the test driver with optimizations off.
driver.o: driver.cpp comparisons.h
$(CC) -c -o driver.o -std=c++11 -O0 driver.cpp
# Compile the code being tested separately with optimizations on.
comparisons.o: comparisons.cpp comparisons.h
$(CC) -c -o comparisons.o -std=c++11 -O3 comparisons.cpp
clean:
rm comparisons.o driver.o test
關於建議this answer,我將我的比較函數編譯爲一個單獨的編譯單元,並進行了優化,並且優化關閉了編譯驅動程序,但我仍然得到了大約5倍的放緩。
strcmp - 154519
strcmp_custom - 506282
我也嘗試複製FreeBSD implementation,但得到了類似的結果。
我想知道如果我的表現測量是俯瞰的東西。或者是標準庫實現更有趣嗎?
我覺得STDLIB有這個功能組件來實現,可能使用一些彙編相關的技巧(雖然我不能給任何的例子)。 – ForceBru
如果圖書館的執行效果不如香草實施,我會認爲它已經損壞。 – harold
例如:https://www.strchr.com/strcmp_and_strlen_using_sse_4。2 –