2012-10-20 99 views
2

我想構建一個內核來執行並行字符串搜索。爲此我傾向於使用有限狀態機。 fsm的轉換表處於內核參數狀態。代碼:OpenCL內核沒有矢量化

__kernel void Find (__constant char *text, 
     const  int offset, 
     const  int tlenght, 
     __constant char *characters, 
     const int clength, 
     const int maxlength, 
     __constant int *states, 
     const int statesdim){ 

    private char c; 
    private int state; 
    private const int id = get_global_id(0); 

    if (id<(tlenght-maxlength)) { 

     private int cIndex,sd,s,k; 

     for (int i=0; i<maxlength; i++) { 

      c = text[i+offset]; 

      cIndex = -1; 

      for (int j=0; j<clength; j++) { 

       if (characters[j]==c) { 
        cIndex = j; 
       }  
      }  

      if (cIndex==-1) { 

       state = 0; 
       break; 

      } else { 

       s = states[state+cIndex*statesdim]; 

      } 

      if (state<=0) break; 

     }  
    } 
} 

如果我使用iocgui編譯這個內核,我得到的結果是:

Using default instruction set architecture. 
Intel OpenCL CPU device was found! 
Device name: Pentium(R) Dual-Core CPU  T4400 @ 2.20GHz 
Device version: OpenCL 1.1 (Build 31360.31426) 
Device vendor: Intel(R) Corporation 
Device profile: FULL_PROFILE 
Build started 
Kernel <Find> was successfully vectorized 
Done. 
Build succeeded! 

當我改變,其中新的狀態被確定爲線:

state = states[state+cIndex*statesdim]; 

結果是:

Using default instruction set architecture. 
Intel OpenCL CPU device was found! 
Device name: Pentium(R) Dual-Core CPU  T4400 @ 2.20GHz 
Device version: OpenCL 1.1 (Build 31360.31426) 
Device vendor: Intel(R) Corporation 
Device profile: FULL_PROFILE 
Build started 
Kernel <Find> was not vectorized 
Done. 
Build succeeded! 

回答

1

聲明

X = states[state+cIndex*statesdim]; 

無法進行向量化,因爲索引不一定會評估爲訪問跨線程的後續字節。

請注意,在您的第一個內核中,您的目標變量s未寫回全局內存。因此,編譯器可能會優化代碼並刪除s = states[state+cIndex*statesdim];語句。因此,看起來你的陳述已經被矢量化了,但事實並非如此。