TL; DR:我不能得到最基本dispy
示例代碼才能正常運行。爲什麼不?dispy示例程序掛起
細節:
我試圖進入分佈式處理的蟒蛇,並認爲dispy庫聽起來很有意思,由於全面的功能集。
不過,我一直努力遵循的基本規範的程序例子,我越來越行不通。
- 我已經安裝了dispy(
python -m pip install dispy
) - 我去到另一臺機器上使用相同的子網地址和跑
python dispynode.py
。它似乎工作,因爲我得到下面的輸出:2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348Enter "quit" or "exit" to terminate dispynode, "stop" to stop
service, "start" to restart service, "cpus" to change CPUs used,
anything else to get status: - 回到我的客戶機上,我跑從http://dispy.sourceforge.net/_downloads/sample.py下載示例代碼,複製在這裏:
# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if __name__ == '__main__':
# executed on client only; variables created below, including modules imported,
# are not available in job computations
import dispy, random
# distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
cluster = dispy.JobCluster(compute)
# run 'compute' with 20 random numbers on available CPUs
jobs = []
for i in range(20):
job = cluster.submit(random.randint(5,20))
job.id = i # associate an ID to identify jobs (if needed later)
jobs.append(job)
# cluster.wait() # waits until all jobs finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
cluster.print_status() # shows which nodes executed how many jobs etc.
當我運行這個(python sample.py
)時,它只是掛起。通過pdb調試,我發現它最終掛在dispy/__init__.py(117)__call__()
。該行的內容爲self.finish.wait()
。完成僅僅是一個Python線程,爲wait()
然後進入lib/python3.5/threading.py(531)wait()
。它一旦等待就會掛起。
我試着運行在客戶機上dispynode,並得到了相同的結果。我已經嘗試了很多傳球節點的變種到創建集羣,e.g:
cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])
我試着與cluster.wait()
行註釋掉運行,並得到了相同的結果。
當我將記錄(cluster = dispy.JobCluster(compute, loglevel = 10)
),我得到了在客戶端的輸出如下:
2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10
這似乎並不意外,但並不能幫助我弄清楚爲什麼工作不運行。
對於它的價值,這裏的_dispy_20160614102701.bak:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
同樣,_dispy_20160614102701.dir:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
我離開的猜測,除非我使用一個不穩定的版本。
我也有這種類型的問題。我想知道是否有解決這個問題的辦法? – avstenit
我還沒找到。事實上,我放棄了,所以我甚至都沒有爲此付出恩典。我也試過[scoop](https://github.com/soravux/scoop),它在表面上完全符合我的需求,但它有一個非常奇怪的[任意限制我可以有效添加的處理器的最大數量](https://groups.google.com/forum/#!topic/scoop-users/WlmqPzlsdec)。我放棄了,決定使用ssh的基本popen,並編寫自己的調度程序。 –
@ThomasGuenet你提出了一個我將要拒絕的編輯。編輯是不恰當的,因爲你正在改變我實際上說過的事情。我確實運行過'python dispy.py',而不是'dispy.py'。他們如何運行是有區別的,因爲你的方式是作爲一個模塊。這種差異可能是該計劃懸而未決的原因。所以你的編輯是不恰當的,但它可能是一個很好的答案。寫下來作爲答案,說明如何運行'dispy.py'而不是'python dispy.py'可以解決問題。如果你令人信服地展示它,你將會回答這個問題。 –