2011-10-26 11 views
1

我有一個8x16的位矩陣作爲UINT8矩陣[16]。MIPS上最快的8x16位矩陣轉置?

我想轉置矩陣並將其存儲爲UINT16矩陣[8]。

這是在我的代碼時間關鍵的一塊,所以我需要儘快做到這一點。在MIPS處理器上實現這一點有一個聰明的方法嗎?

回答

0

我不認爲在MIPS指令集中有任何特別的說明可以幫助解決這個問題,因此您可以使用C語言編寫代碼。如果您有權訪問處理器RTL ....

0

也許是這樣的:

lbu $10, matrix 
    lbu $11, matrix+1 
    lbu $12, matrix+2 
    lbu $13, matrix+3 
    lbu $14, matrix+4 
    lbu $15, matrix+5 
    lbu $16, matrix+6 
    lbu $17, matrix+7 
    lbu $18, matrix+8 
    lbu $19, matrix+9 
    lbu $20, matrix+10 
    lbu $21, matrix+11 
    lbu $22, matrix+12 
    lbu $23, matrix+13 
    lbu $24, matrix+14 
    lbu $25, matrix+15 

    addiu $2, $0, 8 
    addiu $9, $0, 256 
loop: 
    addiu $2, $2, -1 
    srl $9, $9, 1 
    addu $27, $0, $0 

    and $26, $10, $9 
    srlv $26, $26, $2 
    or $27, $27, $26 

    and $26, $11, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $12, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $13, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $14, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $15, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $16, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $17, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $18, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $19, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $20, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $21, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $22, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $23, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $24, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    and $26, $25, $9 
    srlv $26, $26, $2 
    sll $27, $27, 1 
    or $27, $27, $26 

    sll $3, $2, 1 
    sh $27, transposed($3) 
    bgez $2, loop 
    nop 


.data 0x2000 
matrix: 
.byte 0x80 
.byte 0x80 
.byte 0x40 
.byte 0x40 
.byte 0x20 
.byte 0x20 
.byte 0x10 
.byte 0x10 
.byte 0x08 
.byte 0x08 
.byte 0x04 
.byte 0x04 
.byte 0x02 
.byte 0x02 
.byte 0x01 
.byte 0x01 

.data 0x3000 
transposed: 
.half 0 
.half 0 
.half 0 
.half 0 
.half 0 
.half 0 
.half 0 
.half 0 

它讀取輸入矩陣,然後執行循環8次(一次爲每個轉置矩陣的行)。