OpenCL：在GPU上輸入類型

我將數據存儲在char數組中，並且需要從中讀取float和int變量。此代碼工作正常的CPU：OpenCL：在GPU上輸入類型

global float *p; 
p = (global float*)get_pointer_to_the_field(char_array, index); 
*p += 10;

但在GPU出現錯誤-5：CL_OUT_OF_RESOURCES。閱讀本身是有效的，但是用這個值做一些事情（在這種情況下加10）會導致錯誤。我怎麼修復它？

更新：

這適用於GPU：

float f = *p; 
f += 10;

但是，我還是不能將此值寫回陣列。

這裏是內核：

global void write_value(global char *data, int tuple_pos, global char *field_value, 
        int which_field, global int offsets[], global int *num_of_attributes) { 

    int tuple_size = offsets[*num_of_attributes]; 
    global char *offset = data + tuple_pos * tuple_size; 
    offset += offsets[which_field]; 

    memcpy(offset, field_value, (offsets[which_field+1] - offsets[which_field])); 
} 

global char *read_value(global char *data, int tuple_pos, 
        int which_field, global int offsets[], global int *num_of_attributes) { 
    int tuple_size = offsets[*num_of_attributes]; 
    global char *offset = data + tuple_pos * tuple_size; 
    offset += offsets[which_field]; 
    return offset; 
} 

kernel void update_single_value(global char* input_data, global int* pos, global int offsets[], 
          global int *num_of_attributes, global char* types) { 
    int g_id = get_global_id(1); 
    int attr_id = get_global_id(0); 
    int index = pos[g_id]; 

    if (types[attr_id] == 'f') { // if float 

     global float *p; 
     p = (global float*)read_value(input_data, index, attr_id, offsets, num_of_attributes); 
     float f = *p; 
     f += 10; 
     //*p += 10; // not working on GPU 
    } 
    else if (types[attr_id] == 'i') { // if int 
     global int *p; 
     p = (global int*)read_value(input_data, index, attr_id, offsets, num_of_attributes); 
     int i = *p; 
     i += 10; 
     //*p += 10; 
    } 
    else { // if char 
     write_value(input_data, index, read_value(input_data, index, attr_id, offsets, num_of_attributes), attr_id, offsets, num_of_attributes); 
    } 
}

它更新一個表的元組，int和float的值增加10，炭場只是替換了相同的內容。

來源

2017-06-17 vgeclair

如果你仍然有這個問題，你應該發佈更全面的源代碼。 – pmdj

添加完整的內核代碼。 – vgeclair

這些數據的對齊情況如何？基於float和int的項目是否與4個字節邊界對齊？否則，這可能是問題的根源。 – pmdj

原來，問題的發生是因爲char數組中的int和float值不是4字節對齊的。當我正在寫地址像

offset = data + tuple_pos*4; // or 8, 16 etc

一切工作正常。但是，下列導致錯誤：

offset = data + tuple_pos*3; // or any other number not divisible by 4

這意味着要麼我應該改變整個設計和存儲值不知何故意外，或添加「空」字節的字符數組，使int和float值4個字節對齊（這不是一個很好的解決方案）。

來源

2017-08-21 12:57:44 vgeclair

您是否啓用byte_addressable_store extension？據我所知，除非啓用此功能，否則按字母順序寫入全局內存在OpenCL中定義不明確。（您需要檢查擴展是否被您的實現支持。）

您可能還想考慮在內核參數中使用「正確」類型 - 這可能有助於編譯器生成更高效的代碼。如果類型可以動態變化，你可以嘗試使用聯合類型（或者結構類型中的聯合字段），儘管我還沒有用OpenCL自己測試過。

來源

2017-06-25 16:48:56 pmdj

對不起，我沒有注意到你的回覆！我剛剛嘗試添加cl_khr_byte_addressable_store，但問題仍然存在。 – vgeclair

OpenCL：在GPU上輸入類型

回答

相關問題