我正在使用以下代碼來測試初始化後在運行時刷新高速緩存的效果:daxpy例程(具有fill()和wall_time()例程的完整代碼位於以下位置: http://codepad.org/QuLT3cbD - 它是150行):高速緩存刷新後定時的非常高的不確定性
#define KB 1024
int main()
{
int cache_size = 32*KB;
double alpha = 42.5;
int operand_size = cache_size/(sizeof(double)*2);
double* X = new double[operand_size];
double* Y = new double[operand_size];
//95% confidence interval
double max_risk = 0.05;
//Interval half width
double w;
int n_iterations = 100;
students_t dist(n_iterations-1);
double T = boost::math::quantile(complement(dist,max_risk/2));
accumulator_set<double, stats<tag::mean,tag::variance> > unflushed_acc;
for(int i = 0; i < n_iterations; ++i)
{
fill(X,operand_size);
fill(Y,operand_size);
double seconds = wall_time();
daxpy(alpha,X,Y,operand_size);
seconds = wall_time() - seconds;
unflushed_acc(seconds);
}
w = T*sqrt(variance(unflushed_acc))/sqrt(count(unflushed_acc));
printf("Without flush: time=%g +/- %g ns\n",mean(unflushed_acc)*1e9,w*1e9);
//Using clflush instruction
//We need to put the operands back in cache
accumulator_set<double, stats<tag::mean,tag::variance> > clflush_acc;
for(int i = 0; i < n_iterations; ++i)
{
fill(X,operand_size);
fill(Y,operand_size);
flush_array(X,operand_size);
flush_array(Y,operand_size);
double seconds = wall_time();
daxpy(alpha,X,Y,operand_size);
seconds = wall_time() - seconds;
clflush_acc(seconds);
}
w = T*sqrt(variance(clflush_acc))/sqrt(count(clflush_acc));
printf("With clflush: time=%g +/- %g ns\n",mean(clflush_acc)*1e9,w*1e9);
return 0;
}
當運行該代碼,它報告這些數字的速率和不確定性在其中(在95%置信水平):
沒有沖水:時間= 3103.75 +/- 0.524506 ns 使用clflush:時間= 4651.72 +/- 201.25 ns
爲什麼使用clflush刷新操作數X和Y從緩存中增加超過100倍的測量噪聲?
因此,高速緩存行未命中數量的變化是由於變化有效的預取?有時它能夠比其他預取更多的數據,並且通過啓動迭代平滑了這一點? –
是的,我已經爲我的答案添加了額外的解釋(太長以至於無法評論)。 – amdn
好的。這很好,黑盒子修正只會增加迭代的次數,直到不確定性收斂到總速率的1%。謝謝! –