3
我試圖以正確的方式優化此代碼。我的意思是正確的是...我會想象有一種通用的方法來執行這些優化,以便如果其他人查看代碼,他們將能夠刪除優化。對於可讀性代碼優化SPARC彙編中的循環
C採樣...
int a = 1; // mapped to %l0
int b = 5; // mapped to %l1
int c = 0; // mapped to %l2
int d; // mapped to %l3
while(a < b) {
c += a * b;
++a;
}
d = c * b;
SPARC集版本......
mov %g0, %l2
cmp %l0, %l1
bge END_LOOP
nop
LOOP:
mov %l0, %o0
call .mul
mov %l1, %o1 ! Fill delay slot with second argument
add %l2, %o0, %l2
inc %l0
cmp %l0, %l1
bl LOOP
nop
END_LOOP:
mov %l2, %o0
call .mul ! Fill delay sot with second argument
mov %l1, %o1
mov %o0, %l3
我可以優化第一部分(不知道是否正確),但我不知道如何優化第二部分。
mov %g0, %l2
cmp %l0, %l1
bge,a END_LOOP ! Annul branch to execute if branch is taken
mov %l2, %o0 ! Instruction at target
LOOP:
mov %l0, %o0
call .mul
mov %l1, %o1 ! Fill delay slot with second argument
add %l2, %o0, %l2
inc %l0
cmp %l0, %l1
bl LOOP
nop
mov %l2, %o0 ! Move the instruction to above the target
END_LOOP:
call .mul ! Fill delay sot with second argument
mov %l1, %o1
mov %o0, %l3
任何有關如何執行這些優化的幫助將非常感激。
如果您的目標是SPARC v8或更高版本,則可以使用'SMUL'指令而不是調用'.mul'系統庫例程。 – Michael
我可能應該補充一點。這是一個較舊的32位SPARC機器。 – logbaseinfinity