2013-05-31 106 views
0

我正在用Sage做一些計算。 我在玩fork。我有一個非常簡單的測試情況下,這基本上是這樣的:(看下面的_fork_test_func()一些矩陣計算)SIGILL在Sage/Python後分叉

def fork_test(): 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      # some dummy matrix calculation 
     finally: 
      os._exit(0) 

而且我越來越:

------------------------------------------------------------------------ 
Unhandled SIGILL: An illegal instruction occurred in Sage. 
This probably occurred because a *compiled* component of Sage has a bug 
in it and is not properly wrapped with sig_on(), sig_off(). You might 
want to run Sage under gdb with 'sage -gdb' to debug this. 
Sage will now terminate. 
------------------------------------------------------------------------ 

有了這個(不完全)回溯:

Crashed Thread: 0 Dispatch queue: com.apple.root.default-priority 

Exception Type: EXC_BAD_INSTRUCTION (SIGILL) 
Exception Codes: 0x0000000000000001, 0x0000000000000000 

Application Specific Information: 
BUG IN LIBDISPATCH: flawed group/semaphore logic 

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority 
0 libsystem_kernel.dylib   0x00007fff8c6d1d46 __kill + 10 
1 libcsage.dylib     0x0000000101717f33 sigdie + 124 
2 libcsage.dylib     0x0000000101717719 sage_signal_handler + 364 
3 libsystem_c.dylib    0x00007fff86b1094a _sigtramp + 26 
4 libdispatch.dylib    0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27 
5 libdispatch.dylib    0x00007fff89a66f3e _dispatch_apply2 + 143 
6 libdispatch.dylib    0x00007fff89a66e30 dispatch_apply_f + 440 
7 libBLAS.dylib     0x00007fff906ca435 APL_dtrsm + 1963 
8 libBLAS.dylib     0x00007fff906702b6 cblas_dtrsm + 882 
9 matrix_modn_dense_double.so  0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853 
10 matrix_modn_dense_double.so  0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698 
11 matrix_modn_dense_double.so  0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831 
12 ???        0x00007f99e481a028 0 + 140298940424232 

Thread 1: 
0 libsystem_kernel.dylib   0x00007fff8c6d26d6 __workq_kernreturn + 10 
1 libsystem_c.dylib    0x00007fff86b24f4c _pthread_workq_return + 25 
2 libsystem_c.dylib    0x00007fff86b24d13 _pthread_wqthread + 412 
3 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

Thread 2: 
0 libsystem_kernel.dylib   0x00007fff8c6d26d6 __workq_kernreturn + 10 
1 libsystem_c.dylib    0x00007fff86b24f4c _pthread_workq_return + 25 
2 libsystem_c.dylib    0x00007fff86b24d13 _pthread_wqthread + 412 
3 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

Thread 0 crashed with X86 Thread State (64-bit): 
    rax: 0x0000000000000000 rbx: 0x00007fff5ec8e418 rcx: 0x00007fff5ec8df28 rdx: 0x0000000000000000 
    rdi: 0x000000000000b8f7 rsi: 0x0000000000000004 rbp: 0x00007fff5ec8df40 rsp: 0x00007fff5ec8df28 
    r8: 0x00007fff5ec8e418 r9: 0x0000000000000000 r10: 0x000000000000000a r11: 0x0000000000000202 
    r12: 0x00007f99ea500de0 r13: 0x0000000000000003 r14: 0x00007fff5ec8e860 r15: 0x00007fff906ca447 
    rip: 0x00007fff8c6d1d46 rfl: 0x0000000000000202 cr2: 0x00007fff74a29848 
Logical CPU: 0 

有什麼特別的我需要做a fork?我擡頭看看聖人的裝飾者fork,它看起來基本上是一樣的。

墜機事件也發生在Sage本身的fork裝飾者身上。另一個測試案例:

def fork_test2(): 
    def test(): 
     # do some stuff 
    from sage.parallel.decorate import fork 
    test_ = fork(test, verbose=True) 
    test_() 

即使簡單的測試用例:

def _fork_test_func(): 
    while True: 
     m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)]) 
     m.right_kernel() 

def fork_test(): 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      _fork_test_func() 
     finally: 
      os._exit(0) 

結果略有不同的崩潰:

python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row 
*** set a breakpoint in malloc_error_break to debug 

隨着回溯:

Crashed Thread: 1 Dispatch queue: com.apple.root.default-priority 

Exception Type: EXC_CRASH (SIGABRT) 
Exception Codes: 0x0000000000000000, 0x0000000000000000 

Application Specific Information: 
*** error for object 0x11185f000: pointer being freed already on death-row 


Thread 0:: Dispatch queue: com.apple.main-thread 
0 matrix2.so      0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551 
1 ???        0x000000000000000d 0 + 13 

Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority 
0 libsystem_kernel.dylib   0x00007fff8c6d239a __semwait_signal_nocancel + 10 
1 libsystem_c.dylib    0x00007fff86b17e1b nanosleep$NOCANCEL + 138 
2 libsystem_c.dylib    0x00007fff86b7b9a8 usleep$NOCANCEL + 54 
3 libsystem_c.dylib    0x00007fff86b67eca __abort + 203 
4 libsystem_c.dylib    0x00007fff86b67dff abort + 192 
5 libsystem_c.dylib    0x00007fff86b43905 szone_error + 580 
6 libsystem_c.dylib    0x00007fff86b43f7d free_large + 229 
7 libsystem_c.dylib    0x00007fff86b3b8f8 free + 199 
8 libBLAS.dylib     0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132 
9 libdispatch.dylib    0x00007fff89a65f01 _dispatch_call_block_and_release + 15 
10 libdispatch.dylib    0x00007fff89a620b6 _dispatch_client_callout + 8 
11 libdispatch.dylib    0x00007fff89a631fa _dispatch_worker_thread2 + 304 
12 libsystem_c.dylib    0x00007fff86b24d0b _pthread_wqthread + 404 
13 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

同樣也會發生此:

def fork_test2(): 
    from sage.parallel.decorate import fork 
    test_ = fork(_fork_test_func, verbose=True) 
    test_() 

- 但前提是你之前使用一些其他的矩陣計算。


這個測試用例也適用於新的賢者會話:

def _fork_test_func(iterator=None): 
    if not iterator: 
     import itertools 
     iterator = itertools.count() 
    for i in iterator: 
     m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)]) 
     m.right_kernel() 

def fork_test(): 
    _fork_test_func(range(10)) 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      _fork_test_func() 
     finally: 
      os._exit(0) 

我下載了賢者5.8的MacOSX的64位二進制文​​件。

(請注意,我還問上ask.sagemath.org here

回答

1

這兩個crashreports的表明,多線程程序fork()版,這極大地限制了一套可安全執行的操作孩子,你基本上只能在standard調用execve()等,與一些其他的功能,它沿着從async-signal-safe功能

這在fork(2)手冊頁的CAVEATS部分記錄的清單,以及:

一個進程應該創建一個單線程。如果多線程進程調用fork(),則新進程應包含調用線程的副本及其整個地址空間,可能包括互斥鎖和其他資源的狀態。因此,爲避免錯誤,子進程可能只會執行異步信號安全操作,直到調用其中一個exec函數爲止。

由於在Mac OS X框架許多API會導致進程成爲多線程的,如果你想叉孩子完全可用,你必須限制你的父進程操作fork之前記錄的不的API使一個進程多線程(基本上只有POSIX API)。

+0

嗯,好吧,Sage和Python應該能夠支持'fork'。如果不是,它需要修正某個地方的錯誤。我正在尋找它出錯的地方。我正在尋找解決方法或修復方法。另外,Sage和Python不應該使用任何MacOSX API。 – Albert

+1

這兩個回溯都會在libBLAS中顯示一個框架,這絕對不是一個在多線程進程的fork之後可以安全使用的庫。 – das

+0

那麼需要什麼才能使其安全? (請注意,我真的想讓它成爲可能,Sage應該支持這一點。) – Albert