註冊短缺使用SSE內在

在這篇文章中SSE load/store memory transactions我問了一下明確的寄存器內存交易和中間指針之間的差異。在實踐中，中間指針表現出略高的性能，但是，在硬件方面什麼是中間指針還不清楚？如果創建了指針，是否意味着某些寄存器也被佔用，或者在某些SSE操作（例如_mm_mul）中發生了寄存器的調用？註冊短缺使用SSE內在

讓我們考慮例如：

struct sse_simple 
{ 
    sse_simple(unsigned int InputLength): 
     Len(InputLength/4), 
     input1((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))), 
     input2((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))), 
     output((float*)_mm_malloc((float *)_mm_malloc(cast_sz*sizeof(float), 16))), 
     inp1_sse(reinterpret_cast<__m128*>(input1)), 
     inp1_sse(reinterpret_cast<__m128*>(input2)), 
     output_sse(reinterpret_cast<__m128*>(output)) 
    {} 

    ~sse_simple() 
    { 
     _mm_free(input1); 
     _mm_free(input2); 
     _mm_free(output); 
    } 

    void func() 
    { 
     for(auto i=0; i<Len; ++i) 
      output_sse[i] = _mm_mul(inp1_sse[i], inp2_sse[i]); 
    } 

    float *input1; 
    float *input2; 
    float *output; 

    __m128 *inp1_sse; 
    __m128 *inp2_sse; 
    __m128 *output_sse; 

    unsigned int Len; 
};

在上面的例子中間的指針inp1_sse，inp2_sse和output_sse創建一次，在構造函數中。如果我複製大量的sse_simple對象（例如50 000或更多），是否會導致寄存器短缺？

來源

2013-07-17 gorill

首先，寄存器是小存儲器，其接近（指訪問速度非常快）的計算單元。編譯器儘可能地儘可能地使用它們來加速計算，但是當它不能使用內存時。由於寄存器中存儲的內存量很小，通常寄存器在計算過程中只用作臨時值。大多數的一切都結束了時被存儲在內存中，除了臨時變量，如循環指標...所以寄存器的短缺只會拖慢計算。

在計算中，指針被存儲在通用寄存器（GPR）是否指向上浮動，載體或什麼，而矢量__m128被存儲在特定的寄存器。

所以在你例如樹陣列將被存儲在存儲器和線路

output_sse[i] = _mm_mul(inp1_sse[i], inp2_sse[i]);

被編譯爲：

movaps -0x30(%rbp),%xmm0 # load inp1_sse[i] in register %xmm0 
movaps -0x20(%rbp),%xmm1 # load inp2_sse[i] in register %xmm1 
mulps %xmm1,%xmm0   # perform the multiplication the result is stored in %xmm0 
movaps %xmm0,(%rdx)   # store the result in memory

正如你可以看到指針通過寄存器%rbp和%rdx被存儲。

來源

2013-07-17 10:55:59 hivert

註冊短缺使用SSE內在

回答

相關問題