我正在Arch Linux ARM(在Raspberry Pi集羣上,更具體)中使用mpi4py(1.3.1)和openmpi(1.8.6-1)構建MPI應用程序。我在3個節點(4個流程)成功運行我的程序,並嘗試添加一個新的節點時,這裏發生了什麼:主機密鑰驗證失敗使用mpi4py
Host key verification failed.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
有趣的是,SSH密鑰都很好,因爲我使用相同的節點(我可以刪除主機文件的任何條目,添加新節點,它會工作,所以我很確定問題不在於配置錯誤的ssh設置,只發生在我使用5個進程時) 。
這可能是某種庫中的錯誤嗎?
這是我的主機文件
192.168.1.26 slots=2
192.168.1.188 slots=1
#192.168.1.202 slots=1 If uncommented and run with -np 5, it will raise the error
192.168.1.100 slots=1
提前感謝!
您是否嘗試過使用裸露的ssh連接?錯誤聽起來像這樣:http://stackoverflow.com/questions/19018385/host-key-verification-failed – Jakuje
是的。 ssh在任何主機上都能正常工作。如果使用不超過3個節點,MPI也可以正常工作。哪一個並不重要。我一直在嘗試檢查特定節點配置中的潛在錯誤:/ – martinarroyo