我在使用*
和+
時使用fmaf
函數的性能正在下降。我在兩臺Linux機器上,使用g ++ 4.4.3和g ++ 4.6.3fmaf奇怪的表現
在兩臺不同的機器上,如果在不使用fmaf
的情況下填充了myOut
向量,以下代碼運行得更快。
與克服務器++ 4.6.3和Intel(R)至強(R)CPU E5-2650 @ 2.00GHz
$ ./a.out fmaf
Time: 1.55008 seconds.
$ ./a.out muladd
Time: 0.403018 seconds.
與克服務器++ 4.4.3和Intel(R)至強(R)CPU X5650 @ 2.67GHz
$ ./a.out fmaf
Time: 0.547544 seconds.
$ ./a.out muladd
Time: 0.34955 seconds.
不應該fmaf
版本(除了以避免額外的綜述,然後更精確)快?
#include <stddef.h>
#include <iostream>
#include <math.h>
#include <string.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc, char** argv) {
if (argc != 2) {
std::cout << "missing parameter: 'muladd' or 'fmaf'"
<< std::endl;
exit(-1);
}
struct timeval start,stop,result;
const size_t mySize = 1e6*100;
float* myA = new float[mySize];
float* myB = new float[mySize];
float* myC = new float[mySize];
float* myOut = new float[mySize];
gettimeofday(&start,NULL);
if (!strcmp(argv[1], "muladd")) {
for (size_t i = 0; i < mySize; ++i) {
myOut[i] = myA[i]*myB[i]+myC[i];
}
} else if (!strcmp(argv[1], "fmaf")) {
for (size_t i = 0; i < mySize; ++i) {
myOut[i] = fmaf(myA[i], myB[i], myC[i]);
}
} else {
std::cout << "specify 'muladd' or 'fmaf'" << std::endl;
exit(-1);
}
gettimeofday(&stop,NULL);
timersub(&stop,&start,&result);
std::cout << "Time: " << result.tv_sec + result.tv_usec/1000.0/1000.0
<< " seconds." << std::endl;
delete []myA;
delete []myB;
delete []myC;
delete []myOut;
}
「函數調用開銷加上實際的乘法和加法指令」:實現不提供指令的處理器的fmaf需要乘法和加法。下面是libc如何執行FPU四捨五入模式的變化:http://www.sourceware.org/ml/libc-alpha/2010-10/msg00007.html –