此問題與以前回答的問題有關: Fast 24-bit array -> 32-bit array conversion? 在一個答案中,interjay寄出了用於轉換RGB24-> RGB32的SSE3代碼,但是我也需要反向轉換(RGB32-> RGB24) 。我給它一個鏡頭(見下文),我的代碼可以工作,但它比interjay的代碼更復雜,而且顯着更慢。我看不出如何完全顛倒指令:_mm_alignr_epi8在這種情況下似乎沒有幫助,但我對SSE3並不熟悉。不對稱是不可避免的,還是有更快的替代和換位?快速的32位數組 - > SSE3中的24位數組轉換? (RGB32 - > RGB24)
RGB32 - > RGB24:
__m128i *src = ...
__m128i *dst = ...
__m128i mask = _mm_setr_epi8(0,1,2,4, 5,6,8,9, 10,12,13,14, -1,-1,-1,-1);
for (UINT i = 0; i < Pixels; i += 16) {
__m128i sa = _mm_shuffle_epi8(_mm_load_si128(src), mask);
__m128i sb = _mm_shuffle_epi8(_mm_load_si128(src + 1), mask);
__m128i sc = _mm_shuffle_epi8(_mm_load_si128(src + 2), mask);
__m128i sd = _mm_shuffle_epi8(_mm_load_si128(src + 3), mask);
_mm_store_si128(dst, _mm_or_si128(sa, _mm_slli_si128(sb, 12)));
_mm_store_si128(dst + 1, _mm_or_si128(_mm_srli_si128(sb, 4), _mm_slli_si128(sc, 8)));
_mm_store_si128(dst + 2, _mm_or_si128(_mm_srli_si128(sc, 8), _mm_slli_si128(sd, 4)));
src += 4;
dst += 3;
}
RGB24 - > RGB32(禮貌interjay):
__m128i *src = ...
__m128i *dst = ...
__m128i mask = _mm_setr_epi8(0,1,2,-1, 3,4,5,-1, 6,7,8,-1, 9,10,11,-1);
for (UINT i = 0; i < Pixels; i += 16) {
__m128i sa = _mm_load_si128(src);
__m128i sb = _mm_load_si128(src + 1);
__m128i sc = _mm_load_si128(src + 2);
__m128i val = _mm_shuffle_epi8(sa, mask);
_mm_store_si128(dst, val);
val = _mm_shuffle_epi8(_mm_alignr_epi8(sb, sa, 12), mask);
_mm_store_si128(dst + 1, val);
val = _mm_shuffle_epi8(_mm_alignr_epi8(sc, sb, 8), mask);
_mm_store_si128(dst + 2, val);
val = _mm_shuffle_epi8(_mm_alignr_epi8(sc, sc, 4), mask);
_mm_store_si128(dst + 3, val);
src += 3;
dst += 4;
}
那麼,沒有SSE4.1允許? – harold 2012-04-02 22:05:16
您只需在4個輸入寄存器上使用6個掩碼將它們轉換爲3個輸出寄存器。你不能繞過三個'或',因爲'pshufb'設置一個字節爲0或者由掩碼索引的值。 – hirschhornsalz 2012-04-02 22:14:48