2015-12-12 63 views
2

在我看來,在並行運行的python代碼中,至少有一個處理器失敗的斷言應該中止所有處理器,因此:在python中並行運行的一致斷言?

1)錯誤消息清晰可見堆棧跟蹤)

2)其餘的處理器不永遠保持等待。

然而,這不是標準的斷言做什麼。

這個問題已經在 python script running with mpirun not stopping if assert on processor 0 fails 問了,但我不滿意的答案。在那裏建議使用comm.Abort()函數,但這隻能回答上面的第2點)。

所以我想知道:有沒有一個標準的「斷言」功能的並行代碼(例如用mpi4py),或者我應該寫我自己的斷言用於這一目的?

謝謝!

編輯 - 這是我嘗試(在一類,但可能是外),能夠可靠地進行改進:

import mpi4py.MPI as mpi 
import traceback 

class My_code(): 

    def __init__(self, some_parameter=None): 

     self.current_com = mpi.COMM_WORLD 
     self.rank = self.current_com.rank 
     self.nb_procs = self.current_com.size 

     self.my_assert(some_parameter is not None) 
     self.parameter = some_parameter 
     print "Ok, parameter set to " + repr(self.parameter) 

    # some class functions here... 

    def my_assert(self, assertion): 
     """ 
     this is a try for an assert function that kills 
     every process in a parallel run 
     """ 
     if not assertion: 
      print 'Traceback (most recent call last):' 
      for line in traceback.format_stack()[:-1]: 
       print(line.strip()) 
      print 'AssertionError' 
      if self.nb_procs == 1: 
       exit() 
      else: 
       self.current_com.Abort() 
+0

你可能會考慮https://groups.google.com/forum/#!topic/mpi4py/me2TFzHmmsQ出現在搜索:'mpi4py停止異常' –

+0

感謝您的鏈接!我正在寫一個基於它的答案。 –

回答

0

我認爲下面的代碼段回答了這個問題。它是由丹D.指出討論中得出

import mpi4py.MPI as mpi 
import sys 


# put this somewhere but before calling the asserts 
sys_excepthook = sys.excepthook 
def mpi_excepthook(type, value, traceback): 
    sys_excepthook(type, value, traceback) 
    if mpi.COMM_WORLD.size > 1: 
     mpi.COMM_WORLD.Abort(1) 
sys.excepthook = mpi_excepthook 

# example: 
if mpi.COMM_WORLD.rank == 0: 
    # with sys.excepthook redefined as above this will kill every processor 
    # otherwise this would only kill processor 0 
    assert 1==0   

# assume here we have a lot of print messages 
for i in range(50): 
    print "rank = ", mpi.COMM_WORLD.rank 

# with std asserts the code would be stuck here 
# and the error message from the failed assert above would hardly be visible 
mpi.COMM_WORLD.Barrier()