二進制圖像上的快速像素計數 - ARM霓虹內部函數 - iOS Dev

有人可以告訴我一個快速函數，以計算二進制圖像中的白色像素數。我需要它的iOS應用程序開發。我作爲二進制圖像上的快速像素計數 - ARM霓虹內部函數 - iOS Dev

bool *imageData = (bool *) malloc(noOfPixels * sizeof(bool));

上定義的圖像的存儲直接合作，我實現了功能

   int whiteCount = 0; 
      for (int q=i; q<i+windowHeight; q++) 
      { 
       for (int w=j; w<j+windowWidth; w++) 
       { 
        if (imageData[q*W + w] == 1) 
         whiteCount++; 
       } 
      }

這顯然是最慢的功能成爲可能。我聽說ARM霓虹內部函數上的iOS 可以用來在1個週期內進行幾個操作。也許這就是要走的路？

問題是我不是很熟悉，也沒有足夠的時間學習彙編語言。所以如果任何人都可以發佈上面提到的問題的Neon intrinsics代碼或者C/C++中的其他快速實現，那將是非常棒的。

在NEON內在，我能在網上找到的唯一的代碼是RGB至灰色 http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/

來源

2012-01-16 shreyas253

我會看看這個，但是'sizeof（bool）'是什麼？ – 2012-01-16 22:29:15

另外，'imageData []'中可能的值是什麼？它只是0或1，還是可以有其他非零值？ – 2012-01-16 22:35:11

首先你可以一點點通過分解出乘法和擺脫分公司加快原代碼：

int whiteCount = 0; 
for (int q = i; q < i + windowHeight; q++) 
{ 
    const bool * const row = &imageData[q * W]; 

    for (int w = j; w < j + windowWidth; w++) 
    { 
     whiteCount += row[w]; 
    } 
}

（假設imageData[]是真正的二進制，即每個元素永遠只能是0或1）

下面是一個簡單的NEON實現：

#include <arm_neon.h> 

// ... 

int i, w; 
int whiteCount = 0; 
uint32x4_t v_count = { 0 }; 

for (q = i; q < i + windowHeight; q++) 
{ 
    const bool * const row = &imageData[q * W]; 

    uint16x8_t vrow_count = { 0 }; 

    for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop 
    { 
     uint8x16_t v = vld1q_u8(&row[j]);   // load 16 x 8 bit pixels 
     vrow_count = vpadalq_u8(vrow_count, v);  // accumulate 16 bit row counts 
    } 
    for (; w < j + windowWidth; ++w)    // scalar clean up loop 
    { 
     whiteCount += row[j]; 
    } 
    v_count = vpadalq_u16(v_count, vrow_count);  // update 32 bit image counts 
}             // from 16 bit row counts 
// add 4 x 32 bit partial counts from SIMD loop to scalar total 
whiteCount += vgetq_lane_s32(v_count, 0); 
whiteCount += vgetq_lane_s32(v_count, 1); 
whiteCount += vgetq_lane_s32(v_count, 2); 
whiteCount += vgetq_lane_s32(v_count, 3); 
// total is now in whiteCount

（假設imageData[]是真正的二進制，用於黑色的255白色，，和sizeof(bool) == 1）

更新版本unsigned char和值：

#include <arm_neon.h> 

// ... 

int i, w; 
int whiteCount = 0; 
const uint8x16_t v_mask = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }; 
uint32x4_t v_count = { 0 }; 

for (q = i; q < i + windowHeight; q++) 
{ 
    const uint8_t * const row = &imageData[q * W]; 

    uint16x8_t vrow_count = { 0 }; 

    for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop 
    { 
     uint8x16_t v = vld1q_u8(&row[j]);   // load 16 x 8 bit pixels 
     v = vandq_u8(v, v_mask);     // mask out all but LS bit 
     vrow_count = vpadalq_u8(vrow_count, v);  // accumulate 16 bit row counts 
    } 
    for (; w < j + windowWidth; ++w)    // scalar clean up loop 
    { 
     whiteCount += (row[j] == 255); 
    } 
    v_count = vpadalq_u16(v_count, vrow_count);  // update 32 bit image counts 
}             // from 16 bit row counts 
// add 4 x 32 bit partial counts from SIMD loop to scalar total 
whiteCount += vgetq_lane_s32(v_count, 0); 
whiteCount += vgetq_lane_s32(v_count, 1); 
whiteCount += vgetq_lane_s32(v_count, 2); 
whiteCount += vgetq_lane_s32(v_count, 3); 
// total is now in whiteCount

（這假定imageData[]是具有255值的白色，0代表黑和imageWidth <= 2^19。）

。注意，上述所有碼爲不可經過測試，可能需要進一步的工作。

來源

2012-01-16 22:43:06

對不起 - 這裏有幾個拼寫錯誤（現在已修復） - 我應該提到這是未經測試的代碼，因此可能需要一些進一步的工作 - 我只是試圖提出一般想法。 – 2012-01-17 09:40:26

對不起，再次困擾..但我得到一個錯誤在線uint8x16_t v = vld1q_u8（＆row [j]）;說 - 無法初始化const unit8_t *（又名無符號字符*）的變量與值類型const布爾* - 任何想法是什麼問題可能是？ – shreyas253 2012-01-17 09:58:42

我使用bool *作爲圖像數據，因爲我認爲它會更快，因爲每個值只需要1位內存 – shreyas253 2012-01-17 10:00:05

http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

第6.55.3.6

矢量化算法會做比較的代碼並將它們放置在一個結構中，但是您仍然需要檢查結構中的每個元素，並確定它是否爲零。

該循環當前運行有多快，您需要多快才能運行？另外請記住，NEON將與浮點單元在相同的寄存器中工作，因此在這裏使用NEON可能會強制執行FPU上下文切換。

來源

2012-01-16 22:38:27 nmjohn

二進制圖像上的快速像素計數 - ARM霓虹內部函數 - iOS Dev

回答

相關問題