我已經設置KMP_AFFINITY來分散,但執行時間增加了很多!Openmp。如何檢索線程正在運行的核心ID
這就是爲什麼我認爲OpenMP僅在1個內核上產生線程。
所以我需要一些東西 ,返回當前線程正在使用的內核。
這是我使用之前在for循環的編譯:
int procs = omp_get_num_procs();
#pragma omp parallel for num_threads(procs)\
shared (c, u, v, w, k, j, i, nx, ny) \
reduction(+: a, b, c, d, e, f, g, h, i)
而這些都是我做的出口:
export OMP_NUM_THREADS=5
export KMP_AFFINITY=verbose,scatter
如果有幫助,我也粘貼了詳細:
OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid instr info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #159: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #160: KMP_AFFINITY: OS proc to physical thread map ([] => level not in map):
OMP: Info #168: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 4 maps to package 0 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 1 maps to package 1 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 5 maps to package 1 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 3 maps to package 1 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 [thread 0]
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}
在此先感謝!
變量是默認共享的。您沒有任何「私人」條款,因此您認爲許多變量是私有的可能實際上是共享的。數據競爭和錯誤共享可能會大大降低程序的性能,並讓您認爲所有線程都運行在單個內核上。 –
您展示的詳細列表似乎並不符合您聲稱的運行,因爲它顯示了八個OpenMP線程,您可以看到每個線程都綁定到一個單獨的邏輯CPU,而您聲稱使用五個線程。 (所以它肯定*是*使用所有硬件)。你沒有說基本情況是什麼,只是分散速度比......某些東西...在你的機器中,有可能四個線程全部在一個套接字中,比起兩個套接字中的四個線程,的數據共享。 –
p.s.如果您不相信運行時的輸出顯示它正在執行的操作,並假設您在Linux上,則只需運行xosview並在運行代碼時查看每個邏輯CPU上的負載。 –