2016-10-05 60 views
1

語境量大僵局

我需要運行一個multiprocessing.ThreadPool內multiprocessing.Process。 起初似乎很奇怪,但它是我發現處理segfault的唯一方法,可能會發生,因爲我正在使用C++共享庫。 如果一個段錯誤追加,進程被終止,我可以檢查process.exitcode並處理它。

問題

過了一會兒,當我試圖加入這一進程死鎖追加。

下面是一個簡單的版本,我的代碼:

import sys, time, multiprocessing 
from multiprocessing.pool import ThreadPool 

def main(): 
    # Launch 8 workers 
    pool = ThreadPool(8) 
    it = pool.imap(run, range(500)) 
    while True: 
     try: 
      it.next() 
     except StopIteration: 
      break 

def run(value): 
    # Each worker launch it own Process 
    process = multiprocessing.Process(target=run_and_might_segfault,  args=(value,)) 
    process.start() 

    while process.is_alive(): 
     sys.stdout.write('.') 
     sys.stdout.flush() 
     time.sleep(0.1) 

    # Will never join after a while, because of a mystery deadlock 
    process.join() 

    # Deals with process.exitcode to log errors 

def run_and_might_segfault(value): 
    # Load a shared library and do stuff (could throw c++ exception, segfault ...) 
    print(value) 

if __name__ == '__main__': 
    main() 

這裏是一個可能的輸出:

➜ ~ python m.py 
..0 
1 
........8 
.9 
.......10 
......11 
........12 
13 
........14 
........16 
........................................................................................ 

正如你所看到的,process.is_alive()幾次迭代後常是真實的,過程中會絕不加入。

如果我CTRL-C的腳本得到這個堆棧跟蹤:

Traceback (most recent call last): 
    File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 680, in next 
    item = self._items.popleft() 
IndexError: pop from an empty deque 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "m.py", line 30, in <module> 
    main() 
    File "m.py", line 9, in main 
    it.next() 
    File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/multiprocessing/pool.py", line 684, in next 
    self._cond.wait(timeout) 
    File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/threading.py", line 293, in wait 
    waiter.acquire() 
KeyboardInterrupt 

Error in atexit._run_exitfuncs: 
Traceback (most recent call last): 
    File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll 
    pid, sts = os.waitpid(self.pid, flag) 
KeyboardInterrupt 

PS 在MacOS使用Python 3.5.2。

各種幫助表示感謝,謝謝。

編輯

我嘗試使用Python 2.7版,它運作良好。可能只是一個python 3.5問題?

回答

4

該問題也在CPython的最新版本 - Python 3.7.0a0 (default:4e2cce65e522, Oct 13 2016, 21:55:44)上轉載。

如果attach與GDB卡住的過程之一,你會發現它正試圖在sys.stdout.flush()調用獲取鎖:

(gdb) py-list 
263    import traceback 
264    sys.stderr.write('Process %s:\n' % self.name) 
265    traceback.print_exc() 
266   finally: 
267    util.info('process exiting with exitcode %d' % exitcode) 
>268    sys.stdout.flush() 
269    sys.stderr.flush() 
270 
271   return exitcode 

Python的水平回溯看起來是這樣的:

(gdb) py-bt 
Traceback (most recent call first): 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/process.py", line 268, in _bootstrap 
    sys.stdout.flush() 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/popen_fork.py", line 74, in _launch 
    code = process_obj._bootstrap() 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/popen_fork.py", line 20, in __init__ 
    self._launch(process_obj) 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/context.py", line 277, in _Popen 
    return Popen(process_obj) 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/context.py", line 223, in _Popen 
    return _default_context.get_context().Process._Popen(process_obj) 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/process.py", line 105, in start 
    self._popen = self._Popen(self) 
    File "deadlock.py", line 17, in run 
    process.start() 
    File "/home/rpodolyaka/src/cpython/Lib/multiprocessing/pool.py", line 119, in worker 
    result = (True, func(*args, **kwds)) 
    File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 864, in run 
    self._target(*self._args, **self._kwargs) 
    File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 916, in _bootstrap_inner 
    self.run() 
    File "/home/rpodolyaka/src/cpython/Lib/threading.py", line 884, in _bootstrap 
    self._bootstrap_inner() 

在翻譯水平,它看起來像:

(gdb) frame 6 

(gdb) list 
287  return 0; 
288 } 
289 relax_locking = (_Py_Finalizing != NULL); 
290 Py_BEGIN_ALLOW_THREADS 
291 if (!relax_locking) 
292  st = PyThread_acquire_lock(self->lock, 1); 
293 else { 
294  /* When finalizing, we don't want a deadlock to happen with daemon 
295   * threads abruptly shut down while they owned the lock. 
296   * Therefore, only wait for a grace period (1 s.). ... */ 

(gdb) p /x self->lock 
$1 = 0xd25ce0 

(gdb) p /x self->owner 
$2 = 0x7f9bb2128700 

注,即從這一特定的子進程的鎖仍然在父進程中的一個線程(LWP 1105)所擁有的一點:

(gdb) info threads 
    Id Target Id   Frame 
* 1 Thread 0x7f9bb5559440 (LWP 1102) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, 
    futex_word=0xe4d340) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 
    2 Thread 0x7f9bb312a700 (LWP 1103) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    3 Thread 0x7f9bb2929700 (LWP 1104) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    4 Thread 0x7f9bb2128700 (LWP 1105) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    5 Thread 0x7f9bb1927700 (LWP 1106) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    6 Thread 0x7f9bb1126700 (LWP 1107) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    7 Thread 0x7f9bb0925700 (LWP 1108) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    8 Thread 0x7f9b9bfff700 (LWP 1109) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    9 Thread 0x7f9b9b7fe700 (LWP 1110) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    10 Thread 0x7f9b9affd700 (LWP 1111) "python" 0x00007f9bb4780253 in select() at ../sysdeps/unix/syscall-template.S:84 
    11 Thread 0x7f9b9a7fc700 (LWP 1112) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, 
    futex_word=0x7f9b80001ed0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 
    12 Thread 0x7f9b99ffb700 (LWP 1113) "python" 0x00007f9bb5157577 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, 
    futex_word=0x7f9b84001bb0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 

因此,這的確是一個僵局,它發生是由於事實,你在原始進程中同時執行多個sys.stdout多個 線程的寫入和刷新,同時還創建子進程 - 性質爲fork(2)系統調用 子級繼承父內存,包括獲取的鎖:fork()調用必須在獲取鎖的同時執行,並且即使父進程最終釋放它,孩子們也不會看到,因爲他們每個人都有自己的內存空間複製寫入。

因此,你需要混合 多線程多處理,並確保所有的鎖都fork()之前正確釋放,如果它們要 在孩子過程中使用時要非常小心。

它非常類似於在http://bugs.python.org/issue6721

說明。此外,如果你從你的片段刪除與sys.stdout的相互作用,它會正常工作。