2011-03-18 111 views
1

我試圖構建一個工作進程池(使用mutiprocessing.Pool)的跨越大型數據集的python腳本。Python 2.6:使用多處理時處理本地存儲。池

我希望每個進程都有一個獨特的對象,可以在該進程的多個執行中使用。

Psudo代碼:

def work(data): 
    #connection should be unique per process 
    connection.put(data) 
    print 'work done with connection:', connection 

if __name__ == '__main__': 
    pPool = Pool() # pool of 4 processes 
    datas = [1..1000] 
    for process in pPool: 
     #this is the part i'm asking about // how do I really do this? 
     process.connection = Connection(conargs) 
    for data in datas: 
     pPool.apply_async(work, (data)) 

回答

1

我覺得這樣的事情應該工作(未測試)

def init(*args): 
    global connection 
    connection = Connection(*args) 
pPool = Pool(initializer=init, initargs=conargs) 
+0

謝謝,這是關鍵。 – 2011-03-18 19:00:53

+0

你能標記它作爲答案嗎? – 2011-03-18 19:39:11

-1

你想擁有駐留在共享內存對象,對不對?

Python has在標準庫中有一些支持,但它有點不好。據我記得,只有整數和其他一些基本類型可以存儲。

嘗試POSH(Python的對象共享):http://poshmodule.sourceforge.net/

1

這可能是最簡單直接創建mp.Process ES(不含mp.Pool):

import multiprocessing as mp 
import time 

class Connection(object): 
    def __init__(self,name): 
     self.name=name 
    def __str__(self): 
     return self.name 

def work(inqueue,conn): 
    name=mp.current_process().name 
    while 1: 
     data=inqueue.get() 
     time.sleep(.5) 
     print('{n}: work done with connection {c} on data {d}'.format(
      n=name,c=conn,d=data)) 
     inqueue.task_done() 

if __name__ == '__main__': 
    N=4 
    procs=[] 
    inqueue=mp.JoinableQueue() 
    for i in range(N): 
     conn=Connection(name='Conn-'+str(i)) 
     proc=mp.Process(target=work,name='Proc-'+str(i),args=(inqueue,conn)) 
     proc.daemon=True 
     proc.start() 

    datas = range(1,11) 
    for data in datas: 
     inqueue.put(data) 
    inqueue.join() 

產生

Proc-0: work done with connection Conn-0 on data 1 
Proc-1: work done with connection Conn-1 on data 2 
Proc-3: work done with connection Conn-3 on data 3 
Proc-2: work done with connection Conn-2 on data 4 
Proc-0: work done with connection Conn-0 on data 5 
Proc-1: work done with connection Conn-1 on data 6 
Proc-3: work done with connection Conn-3 on data 7 
Proc-2: work done with connection Conn-2 on data 8 
Proc-0: work done with connection Conn-0 on data 9 
Proc-1: work done with connection Conn-1 on data 10 

注意Proc每次對應的編號相同Conn