我試圖端口使用OpenACC的一些C++應用到GPU上。正如人們所期望的那樣,C++代碼有很多封裝和抽象。內存是 分配在一些向量類的類,然後這個類被重用在圍繞應用程序的許多其他 類。我有麻煩試圖正確 在代碼中插入OpenACC雜注。這裏是代碼的一個簡單的例子,我 工作:OpenACC的對C++:致命錯誤:變量是部分地存在設備
#define DATASIZE 16
class Data {
float *arr;
public:
Data() {arr = new float[DATASIZE];}
~Data() { delete [] arr; }
float &get(int i) { return arr[i]; }
};
class DataKeeper {
Data a, b, c;
public:
void init() {
for (int i = 0; i < DATASIZE; ++i)
a.get(i) = 0.0;
}
};
int main() {
DataKeeper DK;
DK.init();
}
我插入一些OpenACC的編譯指示必要的數據和代碼發送到該設備,並最終 這樣的:
#define DATASIZE 16
class Data {
float *arr;
public:
Data() {
arr = new float[DATASIZE];
#pragma acc enter data copyin(this)
#pragma acc enter data create(arr[:DATASIZE])
}
~Data() {
#pragma acc exit data delete(arr)
#pragma acc exit data delete(this)
delete [] arr;
}
float &get(int i) { return arr[i]; }
};
class DataKeeper {
Data a, b, c;
public:
DataKeeper() {
#pragma acc enter data copyin(this)
}
~DataKeeper() {
#pragma acc exit data delete(this)
}
void init() {
#pragma acc parallel loop
for (int i = 0; i < DATASIZE; ++i) {
a.get(i) = 0.0;
}
}
};
int main() {
DataKeeper DK;
DK.init();
}
但編譯和運行後,我得到以下錯誤:
$ pgc++ test.cc -acc -g
$ ./a.out
_T24395416_101 lives at 0x7fff49e03070 size 24 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 3.5, threadid=1
host:0x1ae6eb0 device:0xc05ca0200 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33620 device:0xc05ca0600 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33d10 device:0xc05ca0a00 size:64 presentcount:0+1 line:11 name:(null)
host:0x7fff49e03070 device:0xc05ca0000 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03078 device:0xc05ca0400 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03080 device:0xc05ca0800 size:8 presentcount:0+1 line:11 name:_T24395600_98
allocated block device:0xc05ca0000 size:512 thread:1
allocated block device:0xc05ca0200 size:512 thread:1
allocated block device:0xc05ca0400 size:512 thread:1
allocated block device:0xc05ca0600 size:512 thread:1
allocated block device:0xc05ca0800 size:512 thread:1
allocated block device:0xc05ca0a00 size:512 thread:1
FATAL ERROR: variable in data clause is partially present on the device: name=_T24395416_101
file:/home/bozhenovn/tst/test.cc _ZN10DataKeeperC1Ev line:27
我不知道什麼是錯的代碼。我將不勝感激關於如何解決代碼或建議如何進一步調查問題的任何想法。謝謝!
非常感謝您!我選擇了第二種方法,它似乎工作。順便說一句,你能提出一些關於調試OpenACC應用程序的東西嗎?在書中似乎沒有關於調試的單獨章節,是嗎? – Nikolai
我不知道有關調試OpenACC的任何文章。我應該寫一個。但是,這個問題經常會改變我使用的技術,這取決於應用程序和錯誤的性質,所以不會爲通用調試提供良好的文章。 –
我想在一般情況下,我會建議首先調試沒有任何OpenACC啓用代碼。我有用戶向我發送代碼,說OpenACC被破壞,實際上代碼在「-O0 -g」失敗。接下來,我將轉向使用「-ta = multicore」,因爲您無需擔心數據移動,並可專注於並行化問題。諸如可調試多線程應用程序的PGDBG等調試器也可以工作。最後,移動到GPU並注意編譯器的反饋信息(-Minfo = accel) –