dispy示例程序掛起

TL; DR：我不能得到最基本dispy示例代碼才能正常運行。爲什麼不？dispy示例程序掛起

細節：

我試圖進入分佈式處理的蟒蛇，並認爲dispy庫聽起來很有意思，由於全面的功能集。

不過，我一直努力遵循的基本規範的程序例子，我越來越行不通。

我已經安裝了dispy（python -m pip install dispy）
我去到另一臺機器上使用相同的子網地址和跑python dispynode.py。它似乎工作，因爲我得到下面的輸出：

2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348

Enter "quit" or "exit" to terminate dispynode, "stop" to stop
service, "start" to restart service, "cpus" to change CPUs used,
anything else to get status:
回到我的客戶機上，我跑從http://dispy.sourceforge.net/_downloads/sample.py下載示例代碼，複製在這裏：

# function 'compute' is distributed and executed with arguments 
# supplied with 'cluster.submit' below 
def compute(n): 
    import time, socket 
    time.sleep(n) 
    host = socket.gethostname() 
    return (host, n) 

if __name__ == '__main__': 
    # executed on client only; variables created below, including modules imported, 
    # are not available in job computations 
    import dispy, random 
    # distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client) 
    cluster = dispy.JobCluster(compute) 
    # run 'compute' with 20 random numbers on available CPUs 
    jobs = [] 
    for i in range(20): 
     job = cluster.submit(random.randint(5,20)) 
     job.id = i # associate an ID to identify jobs (if needed later) 
     jobs.append(job) 
    # cluster.wait() # waits until all jobs finish 
    for job in jobs: 
     host, n = job() # waits for job to finish and returns results 
     print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n)) 
     # other fields of 'job' that may be useful: 
     # job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time 
    cluster.print_status() # shows which nodes executed how many jobs etc.

當我運行這個（python sample.py）時，它只是掛起。通過pdb調試，我發現它最終掛在dispy/__init__.py(117)__call__()。該行的內容爲self.finish.wait()。完成僅僅是一個Python線程，爲wait()然後進入lib/python3.5/threading.py(531)wait()。它一旦等待就會掛起。

我試着運行在客戶機上dispynode，並得到了相同的結果。我已經嘗試了很多傳球節點的變種到創建集羣，e.g：

cluster = dispy.JobCluster(compute, nodes=['localhost']) 
cluster = dispy.JobCluster(compute, nodes=['*']) 
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])

我試着與cluster.wait()行註釋掉運行，並得到了相同的結果。

當我將記錄（cluster = dispy.JobCluster(compute, loglevel = 10)），我得到了在客戶端的輸出如下：

2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10

這似乎並不意外，但並不能幫助我弄清楚爲什麼工作不運行。

對於它的價值，這裏的_dispy_20160614102701.bak：

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

同樣，_dispy_20160614102701.dir：

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

我離開的猜測，除非我使用一個不穩定的版本。

來源

2016-06-14 Scott Mermelstein

我也有這種類型的問題。我想知道是否有解決這個問題的辦法？ – avstenit

我還沒找到。事實上，我放棄了，所以我甚至都沒有爲此付出恩典。我也試過[scoop]（https://github.com/soravux/scoop），它在表面上完全符合我的需求，但它有一個非常奇怪的[任意限制我可以有效添加的處理器的最大數量]（https://groups.google.com/forum/#!topic/scoop-users/WlmqPzlsdec）。我放棄了，決定使用ssh的基本popen，並編寫自己的調度程序。 –

@ThomasGuenet你提出了一個我將要拒絕的編輯。編輯是不恰當的，因爲你正在改變我實際上說過的事情。我確實運行過'python dispy.py'，而不是'dispy.py'。他們如何運行是有區別的，因爲你的方式是作爲一個模塊。這種差異可能是該計劃懸而未決的原因。所以你的編輯是不恰當的，但它可能是一個很好的答案。寫下來作爲答案，說明如何運行'dispy.py'而不是'python dispy.py'可以解決問題。如果你令人信服地展示它，你將會回答這個問題。 –

如果你只是運行在客戶機上sample.py，改變你的主要聲明如下：

集羣= dispy.JobCluster（計算，節點= [ 'nodeip_1'， 'nodeip_2'，.. ...，'nodeip_n]）

然後在你的IDE中運行它，或者通過外殼。

我希望有幫助。

來源

2016-06-14 18:55:43 user6466166

感謝您的回答。我以前嘗試過'nodes = ['nodename']'，但它不起作用。根據你的建議，我嘗試了'nodes = ['nodeip']'，它仍然掛起。出於某種原因，它不會與客戶進行通信。 –

如果您的集羣位於同一本地網絡上。在節點上，嘗試以這種方式啓動dispynode腳本。蟒蛇dispynode.py -i pcname（或IP地址）然後我在以前的評論上述運行該腳本。 – user6466166

無論是使用那些給我'OSERROR：[錯誤99]無法分配請求address'（以線252 dispynode.py的：slf.tcp_sock.bind（（ip_addr中，node_port）） –

在執行python sample.py之前，dispynode.py仍應該在本地主機或其他機器上運行（如果不想指定複雜選項，請注意其他機器應該位於同一網絡中）。

我遇到同樣的問題，解決這樣說：

打開一個終端，執行：$ dispynode.py（不終止它）
打開第二個終端並執行：$ python sample.py

不要忘記功能計算在於等待一定時間後，輸出應該執行sample.py後出現至少20秒。

來源

2017-01-06 09:43:39 ThomasGuenet

嗯，這是值得一試，但似乎無關緊要，我是否使用了'python dispynode.py'或者'dispynode.py'。我得到了和我的客戶端相同的結果 - 它掛在wait（）條件下。我嘗試了沒有在集羣上設置節點，與節點設置爲兩種[「主機」]和[「主機IP」。在任何情況下，我得到'因爲我做了'蟒蛇dispynode.py'。 –

當第一次建立和使用dispy在網絡上，我發現我不得不創建作業集羣時指定客戶端節點的IP，見下圖：

cluster = dispy.JobCluster(compute, ip_addr=your_ip_address_here)

看看是否有幫助。

來源

2017-07-12 18:54:57 Dave

dispynode.py'相同的結果，非常感謝你！這是我的情況。 – dismine

dispy示例程序掛起

回答

相關問題