我注意到一個包裹一個浮點數的結構比直接使用浮點數要慢很多,大約有一半的性能。爲什麼添加一個額外的字段結構大大提高了它的性能?
using System;
using System.Diagnostics;
struct Vector1 {
public float X;
public Vector1(float x) {
X = x;
}
public static Vector1 operator +(Vector1 a, Vector1 b) {
a.X = a.X + b.X;
return a;
}
}
然而,在增加一個額外的「額外」的領域,一些魔術似乎發生和表現再次變得更爲合理:
struct Vector1Magic {
public float X;
private bool magic;
public Vector1Magic(float x) {
X = x;
magic = true;
}
public static Vector1Magic operator +(Vector1Magic a, Vector1Magic b) {
a.X = a.X + b.X;
return a;
}
}
我用於衡量這些代碼如下:
class Program {
static void Main(string[] args) {
int iterationCount = 1000000000;
var sw = new Stopwatch();
sw.Start();
var total = 0.0f;
for (int i = 0; i < iterationCount; i++) {
var v = (float) i;
total = total + v;
}
sw.Stop();
Console.WriteLine("Float time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("total = {0}", total);
sw.Reset();
sw.Start();
var totalV = new Vector1(0.0f);
for (int i = 0; i < iterationCount; i++) {
var v = new Vector1(i);
totalV += v;
}
sw.Stop();
Console.WriteLine("Vector1 time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("totalV = {0}", totalV);
sw.Reset();
sw.Start();
var totalVm = new Vector1Magic(0.0f);
for (int i = 0; i < iterationCount; i++) {
var vm = new Vector1Magic(i);
totalVm += vm;
}
sw.Stop();
Console.WriteLine("Vector1Magic time was {0} for {1} iterations.", sw.Elapsed, iterationCount);
Console.WriteLine("totalVm = {0}", totalVm);
Console.Read();
}
}
隨着基準測試結果:
Float time was 00:00:02.2444910 for 1000000000 iterations.
Vector1 time was 00:00:04.4490656 for 1000000000 iterations.
Vector1Magic time was 00:00:02.2262701 for 1000000000 iterations.
編譯/環境設置: 操作系統:Windows 10的64位 工具鏈:VS2017 框架:淨4.6.2 目標:任何CPU不想32位
如果64位被設置爲目標,我們的研究結果更可預測的,但是比我們有Vector1Magic看到在32位目標顯著惡化:
Float time was 00:00:00.6800014 for 1000000000 iterations.
Vector1 time was 00:00:04.4572642 for 1000000000 iterations.
Vector1Magic time was 00:00:05.7806399 for 1000000000 iterations.
對於真正的巫師,我已經包含了IL的轉儲位置:https://pastebin.com/sz2QLGEx
進一步的調查表明,這似乎是特定於Windows運行時,因爲單聲道編譯器產生相同的IL。
在單聲道運行時,與原始浮點數相比,這兩個結構變體的性能差不多有兩倍。這與我們在.Net上看到的性能有很大的不同。
這是怎麼回事?
*請注意,這個問題最初包含一個有缺陷的基準過程(感謝Max Payne指出了這一點),並且已經更新以更準確地反映時間。
即時猜測這是由於結構包裝,現在有更好的內存對齊。 –
您應該添加預熱迭代以排除JIT或其他一次性處理的可能干擾。 – PetSerAl
如果我切換到64位,對於你的「魔法」向量,性能會變差。 – Adrian