爲什麼在C++中劃分速度慢於bitshifting？

我寫了兩段代碼，一段是隨機數除以2，另一段是右移一次相同的隨機數。據我瞭解，這應該產生相同的結果。但是，當我計算兩段代碼時，我始終都會收到數據，說明轉移速度更快。這是爲什麼？爲什麼在C++中劃分速度慢於bitshifting？

移碼：

double iterations = atoi(argv[1]) * 1000; 
int result = 0; 
cout << "Doing " << iterations << " iterations." << endl; 
srand(31459); 
for(int i=0;i<iterations;i++){ 
    if(i % 2 == 0){ 
     result = result + (rand()>>1); 
    }else{ 
     result = result - (rand()>>1); 
    } 
}

分割代碼：

double iterations = atoi(argv[1]) * 1000; 
int result = 0; 
cout << "Doing " << iterations << " iterations." << endl; 
srand(31459); 
for(int i=0;i<iterations;i++){ 
    if(i % 2 == 0){ 
     result = result + (rand()/2); 
    }else{ 
     result = result - (rand()/2); 
    } 
}

定時和結果：

$ time ./divide 1000000; time ./shift 1000000 
Doing 1e+09 iterations. 

real 0m12.291s 
user 0m12.260s 
sys  0m0.021s 
Doing 1e+09 iterations. 

real 0m12.091s 
user 0m12.056s 
sys  0m0.019s 

$ time ./shift 1000000; time ./divide 1000000 
Doing 1e+09 iterations. 

real 0m12.083s 
user 0m12.028s 
sys  0m0.035s 
Doing 1e+09 iterations. 

real 0m12.198s 
user 0m12.158s 
sys  0m0.028s

Addtional信息：

編譯
我在虛擬化運行該安裝Fedora 20中，籽粒的時候我沒有使用任何優化：3.12.10-300.fc20.x86_64

來源

2014-02-27 Avery

使用優化秒。分析未優化的代碼沒有什麼意義。此外，差異非常小，我會比較100次左右。 – juanchopanza

@juanchopanza我使用優化的生產代碼。但是，我仍然想知道爲什麼會出現這種差異。而且，我已經多次運行這個比較，有許多不同的輸入大小，並且發現了類似的差異。 – Avery

然後，您應該編寫一些簡單的代碼，進行分區和移位，並查看使用優化和不使用優化的程序集。 – juanchopanza

實際上並不慢。我已經運行使用nonius您的基準像這樣：

#define NONIUS_RUNNER 
#include "Nonius.h++" 

#include <type_traits> 
#include <random> 
#include <vector> 

NONIUS_BENCHMARK("Divide", [](nonius::chronometer meter) 
{ 
    std::random_device rd; 
    std::uniform_int_distribution<int> dist(0, 9); 

    std::vector<int> storage(meter.runs()); 
    meter.measure([&](int i) { storage[i] = storage[i] % 2 == 0 ? storage[i] - (dist(rd) >> 1) : storage[i] + (dist(rd) >> 1); }); 
}) 

NONIUS_BENCHMARK("std::string destruction", [](nonius::chronometer meter) 
{ 
    std::random_device rd; 
    std::uniform_int_distribution<int> dist(0, 9); 

    std::vector<int> storage(meter.runs()); 
    meter.measure([&](int i) { storage[i] = storage[i] % 2 == 0 ? storage[i] - (dist(rd)/2) : storage[i] + (dist(rd)/2); }); 
})

而且這些結果如下： enter image description here

正如你可以看到他們兩人並駕齊驅。

（您可以在HTML輸出here）

P.S：看來我忘了重新命名的第二次測試。我的錯。

來源

2014-02-27 06:55:20 Borgleader

所以我剛剛得到非常有偏見的數據。呵呵。那麼，感謝實際的基準測試（以及將來更好的工具） – Avery

這不是;它在運行的架構上速度較慢。它幾乎總是比較慢，因爲位移後面的硬件是微不足道的，而分割則有點噩夢。在基數10中，對您而言更簡單78358582354 >> 3或78358582354/85？說明一般需要同一時間執行而不管輸入，並且在你的情況下，它是編譯器的工作將/2轉換爲>>1; CPU只是按照它的說法。

來源

2014-02-27 06:48:33

結果差異似乎是波及結果，所以你不能確定它是不同的。但是一般情況下，單獨操作不能完成比特移位，所以比特移位應該更快。

但是，由於您在代碼中有字面2，所以即使沒有優化，我也會猜測編譯器會生成相同的代碼。

來源

2014-02-27 06:50:13 UldisK

但它顯然不是生成相同的代碼。如果我再次運行這個比較50次，我仍然會有所不同。差異可能並不顯着，但它存在。 – Avery

@Avery您尚未爲任何人提供足夠的數據來驗證這些聲明。對我而言，你的數字看起來與「相等」是一致的。 – juanchopanza

請注意，rand返回int和除以int（默認標記）爲2不等於移位1。您可以輕鬆地查看生成的ASM，看到了差距，或是簡單地重新生成的二進制文件大小：

> g++ -O3 boo.cpp -c -o boo # divide 
> g++ -O3 foo.cpp -c -o foo # shift 
> ls -la foo boo 
... 4016 ... boo # divide 
... 3984 ... foo # shift

現在添加static_cast補丁：

if (i % 2 == 0) { 
    result = result + (static_cast<unsigned>(rand())/2); 
} 
else { 
    result = result - (static_cast<unsigned>(rand())/2); 
}

，並再次檢查大小：

> g++ -O3 boo.cpp -c -o boo # divide 
> g++ -O3 foo.cpp -c -o foo # shift 
> ls -la foo boo 
... 3984 ... boo # divide 
... 3984 ... foo # shift

以確保您可以驗證兩個二進制文件中生成的asm是否相同

來源

2014-02-27 07:28:00

爲什麼在C++中劃分速度慢於bitshifting？

回答

相關問題