我有一個着色器,看起來像這樣:爲什麼更高版本的Cg編譯器使用更多指令來生成着色器?
void main(in float2 pos : TEXCOORD0,
in uniform sampler2D data : TEXUNIT0,
in uniform sampler2D palette : TEXUNIT1,
in uniform float c,
in uniform float th0,
in uniform float th1,
in uniform float th2,
in uniform float4 BackGroundColor,
out float4 color : COLOR
)
{
const float4 dataValue = tex2D(data, pos);
const float vValue = dataValue.x;
const float tValue = dataValue.y;
color = BackGroundColor;
if (tValue <= th2)
{
if (tValue < th1)
{
const float vRealValue = abs(vValue - 0.5);
if (vRealValue > th0)
{
// determine value and color
const float power = (c > 0.0) ? vValue : (1.0 - vValue);
color = tex2D(palette, float2(power, 0.0));
}
}
else
{
color = float4(0.0, tValue, 0.0, 1.0);
}
}
}
,我編譯它是這樣的:
cgc -profile arbfp1 -strict -O3 -q sh.cg -o sh.asm
現在,不同版本的CG編譯器產生不同的輸出。
CGC版本2.2.0006是使用18個指令編譯的着色器到一個彙編程序代碼:
!!ARBfp1.0 PARAM c[6] = { program.local[0..4],{ 0, 1, 0.5 } }; TEMP R0; TEMP R1; TEMP R2; TEX R0.xy, fragment.texcoord[0], texture[0], 2D; ADD R0.z, -R0.x, c[5].y; CMP R0.z, -c[0].x, R0.x, R0; MOV R0.w, c[5].x; TEX R1, R0.zwzw, texture[1], 2D; SLT R0.z, R0.y, c[2].x; ADD R0.x, R0, -c[5].z; ABS R0.w, R0.x; SGE R0.x, c[3], R0.y; MUL R2.x, R0, R0.z; SLT R0.w, c[1].x, R0; ABS R2.y, R0.z; MUL R0.z, R2.x, R0.w; CMP R0.w, -R2.y, c[5].x, c[5].y; CMP R1, -R0.z, R1, c[4]; MUL R2.x, R0, R0.w; MOV R0.xzw, c[5].xyxy; CMP result.color, -R2.x, R0, R1; END # 18 instructions, 3 R-regs
CGC版本3.0.0016被編譯的着色器到使用23個指令的彙編代碼:
!!ARBfp1.0 PARAM c[6] = { program.local[0..4], { 0, 1, 0.5 } }; TEMP R0; TEMP R1; TEMP R2; TEX R0.xy, fragment.texcoord[0], texture[0], 2D; ADD R1.y, R0.x, -c[5].z; MOV R1.z, c[0].x; ABS R1.y, R1; SLT R1.z, c[5].x, R1; SLT R1.x, R0.y, c[2]; SGE R0.z, c[3].x, R0.y; MUL R0.w, R0.z, R1.x; SLT R1.y, c[1].x, R1; MUL R0.w, R0, R1.y; ABS R1.z, R1; CMP R1.y, -R1.z, c[5].x, c[5]; MUL R1.y, R0.w, R1; ADD R1.z, -R0.x, c[5].y; CMP R1.z, -R1.y, R1, R0.x; ABS R0.x, R1; CMP R0.x, -R0, c[5], c[5].y; MOV R1.w, c[5].x; TEX R1, R1.zwzw, texture[1], 2D; CMP R1, -R0.w, R1, c[4]; MUL R2.x, R0.z, R0; MOV R0.xzw, c[5].xyxy; CMP result.color, -R2.x, R0, R1; END # 23 instructions, 3 R-regs
奇怪的是,對於CG 3.0優化級別不似乎影響任何東西。
有人可以解釋發生了什麼?爲什麼優化不起作用,爲什麼使用cg 3.0編譯時着色器更長?
請注意,我從編譯着色器中刪除了註釋。
實際的彙編代碼可能會給出一些線索,那些18/23指令不應該太重以至於無法在此處發佈。 –
@Christian好的,我發佈了代碼,但是因此決定讓它看起來很糟糕。有人可以編輯,並修復代碼? –
如何告訴nVidia這件事,因爲第二個版本似乎更糟。 –