我有一個C++解算器,我需要使用下面的命令並行運行:的mpirun:無法識別的參數MCA
nohup mpirun -np 16 ./my_exec > log.txt &
此命令將我的節點上獨立地對可用16級的處理器上運行my_exec
。這用於完美地工作。
上週,HPC部門進行了操作系統升級,現在,當啓動相同的命令時,我收到兩條警告消息(針對每個處理器)。第一個是:
--------------------------------------------------------------------------
2 WARNING: It appears that your OpenFabrics subsystem is configured to only
3 allow registering part of your physical memory. This can cause MPI jobs to
4 run with erratic performance, hang, and/or crash.
5
6 This may be caused by your OpenFabrics vendor limiting the amount of
7 physical memory that can be registered. You should investigate the
8 relevant Linux kernel module parameters that control how much physical
9 memory can be registered, and increase them to allow registering all
10 physical memory on your machine.
11
12 See this Open MPI FAQ item for more information on these Linux kernel module
13 parameters:
14
15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
16
17 Local host: tamnun
18 Registerable memory: 32768 MiB
19 Total memory: 98294 MiB
20
21 Your MPI job will continue, but may be behave poorly and/or hang.
22 --------------------------------------------------------------------------
23 --------------------------------------------------------------------------
然後我從我的代碼,它告訴我它認爲我只發射1實現代碼(Nprocs
= 1,而不是16)得到的輸出。
177
178 # MPI IS ON; Nprocs = 1
179 Filename = ../input/odtParam.inp
180
181 # MPI IS ON; Nprocs = 1
182
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
最後,第二警告信息是:
185 --------------------------------------------------------------------------
186 An MPI process has executed an operation involving a call to the
187 "fork()" system call to create a child process. Open MPI is currently
188 operating in a condition that could result in memory corruption or
189 other system errors; your MPI job may hang, crash, or produce silent
190 data corruption. The use of fork() (or system() or other calls that
191 create child processes) is strongly discouraged.
192
193 The process that invoked fork was:
194
195 Local host: tamnun (PID 17446)
196 MPI_COMM_WORLD rank: 0
197
198 If you are *absolutely sure* that your application will successfully
199 and correctly survive a call to fork(), you may disable this warning
200 by setting the mpi_warn_on_fork MCA parameter to 0.
201 --------------------------------------------------------------------------
四處尋找在線後,我試圖通過MCA
參數mpi_warn_on_fork
用命令設置爲0以下的警告信息建議:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
其中產生以下錯誤信息:
[[email protected]] match_arg (./utils/args/args.c:194): unrecognized argument mca
[[email protected]] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[[email protected]] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[[email protected]] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
我使用的是RedHat 6.7(聖地亞哥)。我聯繫了HPC部門,但由於我在大學,這個問題可能需要一兩天才能做出迴應。任何幫助或指導,將不勝感激。響應
編輯回答:
事實上,我編譯我與Open MPI的mpic++
代碼在運行英特爾的mpirun
命令的執行,因此錯誤(OS升級後英特爾mpirun
被設置爲默認)。我必須將Open MPI的mpirun
的路徑放在$PATH
環境變量的開頭。
代碼現在按預期方式運行,但我仍然得到上面的第一條警告消息(它不建議我再使用MCA
參數mpi_warn_on_fork
)我認爲(但不確定)這是我需要解決的問題HPC部門
你在commad中有一個錯字是:mpi_warn_on_fork(你寫的作品) – Marco
哈對,我用命令的權利,錯字發佈了問題。 – solalito