我在運行SLES-HPC 12的可用性集中設置了2個Azure A8 VM(以下教程:https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-cluster-rdma/)。啓用Azure RDMA的SLES集羣上的DAPL錯誤
當我運行英特爾MPI乒乓球測試,我得到DAPL錯誤:
[email protected]:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env I_MPI_FABRICS=shm:dapl -env I_MPI_DYNAMIC_CONNECTION=0 -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 pingpong
sshvm1:d28:bef0eb40: 12930 us(12930 us): dapl_rdma_accept: ERR -1 Input/output error
sshvm1:d28:bef0eb40: 12946 us(16 us): DAPL ERR accept Input/output error
[1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c at line 622: 0
internal ABORT - process 0
類似的錯誤運行OSU MPI微基準之一(與IMPI編譯器編譯)時:
[email protected]:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env I_MPI_FABRICS=shm:dapl -env I_MPI_DYNAMIC_CONNECTION=0 -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 pingpong
sshvm1:d28:bef0eb40: 12930 us(12930 us): dapl_rdma_accept: ERR -1 Input/output error
sshvm1:d28:bef0eb40: 12946 us(16 us): DAPL ERR accept Input/output error
[1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c at line 622: 0
internal ABORT - process 0
這些錯誤的原因是什麼?我如何修復並運行這些微基準?謝謝你的幫助!
我也已經通過運行「程序mpiexec -machinefile machinefile -n 2主機名」
優秀,乒乓球測試現在看起來不錯的一個驅動程序。我也很欣賞ARM的說明!謝謝! – kramimus