我有兩臺機器都安裝了MS MPI 7.1,一臺名爲SERVER,另一臺名爲COMPUTE。 這些機器在一個簡單的Windows工作組(無DA)中設置在局域網上,並且都有一個具有相同名稱和密碼的帳戶。MS MPI權限錯誤
兩者都運行MSMPILaunchSvc服務。 兩種機器可以在本地執行MPI作業,通過在上機器本身的終端與hostname
命令
SERVER> mpiexec -hosts 1 SERVER 1 hostname
SERVER
or
COMPUTE> mpiexec -hosts 1 COMPUTE 1 hostname
COMPUTE
測試驗證。
我已經禁用了兩臺機器上的防火牆,以使事情更輕鬆。
我的問題是我不能讓MPI到遠程主機上運行的服務器作業:
1:MSMPILaunchSvc服務器 - >計算與MSMPILaunchSvc
SERVER> mpiexec -hosts 1 COMPUTE 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 1722
Aborting: mpiexec on SERVER is unable to connect to the smpd service on COMPUTE:8677
Other MPI error, error stack:
connect failed - The RPC server is unavailable. (errno 1722)
什麼是更令人沮喪這裏僅在有時候我會提示輸入密碼。它建議SERVER \ Maarten作爲COMPUTE的用戶,我已經在SERVER上登錄帳戶,並且不應該存在於COMPUTE(應該是COMPUTE \ Maarten然後?)。然而它也失敗:
SERVER>mpiexec -hosts 1 COMPUTE 1 hostname.exe -pwd
Enter Password for SERVER\Maarten:
Save Credentials[y|n]? n
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the
smpd manager on COMPUTE:50915 error 1726
2:用MSMPILaunchSvc COMPUTE - >服務器MSMPILaunchSvc
COMPUTE> mpiexec -hosts 1 SERVER 1 hostname -pwd
ERROR: Failed RpcCliCreateContext error 5
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
3:計算與MSMPILaunchSvc - >服務器SMPD守護程序
Aborting: mpiexec on COMPUTE is unable to connect to the smpd service on SERVER:8677
Other MPI error, error stack:
connect failed - Access is denied. (errno 5)
4:帶有MSMPILaunchSvc的服務器 - >帶有smpd守護程序的計算機
ERROR: Failed to connect to SMPD Manager Instance error 1726
Aborting: mpiexec on SERVER is unable to connect to the smpd manager on
COMPUTE:51022 error 1726
更新:
兩個節點上SMPD守護嘗試我得到這個錯誤:
[-1:9796] Authentication completed. Successfully obtained Context for Client.
[-1:9796] version check complete, using PMP version 3.
[-1:9796] create manager process (using smpd daemon credentials)
[-1:9796] smpd reading the port string from the manager
[-1:9848] Launching smpd manager instance.
[-1:9848] created set for manager listener, 376
[-1:9848] smpd manager listening on port 51149
[-1:9796] closing the pipe to the manager
[-1:9848] Authentication completed. Successfully obtained Context for Client.
[-1:9848] Authorization completed.
[-1:9848] version check complete, using PMP version 3.
[-1:9848] Received session header from parent id=1, parent=0, level=0
[01:9848] Connecting back to parent using host SERVER and endpoint 17979
[01:9848] Previous attempt failed with error 5, trying to authenticate without Kerberos
[01:9848] Failed to connect back to parent error 5.
[01:9848] ERROR: Failed to connect back to parent 'ncacn_ip_tcp:SERVER:17979' error 5
[01:9848] smpd manager successfully stopped listening.
[01:9848] SMPD exiting with error code 4294967293.
,並在主機上:
[-1:12264] Launching SMPD service.
[-1:12264] smpd listening on port 8677
[-1:12264] Authentication completed. Successfully obtained Context for Client.
[-1:12264] version check complete, using PMP version 3.
[-1:12264] create manager process (using smpd daemon credentials)
[-1:12264] smpd reading the port string from the manager
[-1:16668] Launching smpd manager instance.
[-1:16668] created set for manager listener, 364
[-1:16668] smpd manager listening on port 18033
[-1:12264] closing the pipe to the manager
[-1:16668] Authentication completed. Successfully obtained Context for Client.
[-1:16668] Authorization completed.
[-1:16668] version check complete, using PMP version 3.
[-1:16668] Received session header from parent id=1, parent=0, level=0
[01:16668] Connecting back to parent using host SERVER and endpoint 18031
[01:16668] Authentication completed. Successfully obtained Context for Client.
[01:16668] Authorization completed.
[01:16668] handling command SMPD_CONNECT src=0
[01:16668] now connecting to COMPUTE
[01:16668] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:16668] using spn msmpi/COMPUTE to contact server
[01:16668] SERVER posting a re-connect to COMPUTE:51161 in left child context.
[01:16668] ERROR: Failed to connect to SMPD Manager Instance error 1726
[01:16668] sending abort command to parent context.
[01:16668] posting command SMPD_ABORT to parent, src=1, dest=0.
[01:16668] ERROR: smpd running on SERVER is unable to connect to smpd service on COMPUTE:8677
[01:16668] Handling cmd=SMPD_ABORT result
[01:16668] cmd=SMPD_ABORT result will be handled locally
[01:16668] parent terminated unexpectedly - initiating cleaning up.
[01:16668] no child processes to kill - exiting with error code -1