我一直在嘗試使用SSE內在函數,我似乎遇到了一個我無法弄清楚的奇怪的錯誤。我正在計算兩個浮點數組的內部產物,每次4個元素。SSE內在函數運算錯誤
爲了測試我已經將兩個數組的每個元素設置爲1,所以產品應該是==大小。
它運行正常,但每當我運行大小>〜68000000的代碼時,使用sse intrinsics的代碼開始計算錯誤的內積。它似乎陷入了一定的數額,永遠不會超過這個數字。下面是一個例子來看:
joe:~$./test_sse 70000000
sequential inner product: 70000000.000000
sse inner product: 67108864.000000
sequential time: 0.417932
sse time: 0.274255
編譯:
gcc -fopenmp test_sse.c -o test_sse -std=c99
這個錯誤似乎是我測試過它的計算機的少數之間是一致的。下面是代碼,也許有人也許能幫助我弄清楚是怎麼回事:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>
#include <math.h>
#include <assert.h>
#include <xmmintrin.h>
double inner_product_sequential(float * a, float * b, unsigned int size) {
double sum = 0;
for(unsigned int i = 0; i < size; i++) {
sum += a[i] * b[i];
}
return sum;
}
double inner_product_sse(float * a, float * b, unsigned int size) {
assert(size % 4 == 0);
__m128 X, Y, Z;
Z = _mm_set1_ps(0.0f);
float arr[4] __attribute__((aligned(sizeof(float) * 4)));
for(unsigned int i = 0; i < size; i += 4) {
X = _mm_load_ps(a+i);
Y = _mm_load_ps(b+i);
X = _mm_mul_ps(X, Y);
Z = _mm_add_ps(X, Z);
}
_mm_store_ps(arr, Z);
return arr[0] + arr[1] + arr[2] + arr[3];
}
int main(int argc, char ** argv) {
if(argc < 2) {
fprintf(stderr, "usage: ./test_sse <size>\n");
exit(EXIT_FAILURE);
}
unsigned int size = atoi(argv[1]);
srand(time(0));
float *a = (float *) _mm_malloc(size * sizeof(float), sizeof(float) * 4);
float *b = (float *) _mm_malloc(size * sizeof(float), sizeof(float) * 4);
for(int i = 0; i < size; i++) {
a[i] = b[i] = 1;
}
double start, time_seq, time_sse;
start = omp_get_wtime();
double inner_seq = inner_product_sequential(a, b, size);
time_seq = omp_get_wtime() - start;
start = omp_get_wtime();
double inner_sse = inner_product_sse(a, b, size);
time_sse = omp_get_wtime() - start;
printf("sequential inner product: %f\n", inner_seq);
printf("sse inner product: %f\n", inner_sse);
printf("sequential time: %f\n", time_seq);
printf("sse time: %f\n", time_sse);
_mm_free(a);
_mm_free(b);
}
你有沒有考慮過你用完精度的可能性? – Mysticial 2014-09-26 04:22:41
您是否注意到67108864是2^26?你認爲這只是一個巧合嗎? – rici 2014-09-26 04:34:45
您是以32位或64位模式編譯? – 2014-09-26 09:52:22