我使用http://fssnip.net/3K的代碼創建了一個控制檯程序。我發現F#郵箱處理器問題
我想添加「System.Console.ReadLine()|>忽略」在最後等待線程的完成。是否有可能告訴所有的MailBoxProcessors完成並且程序可以退出?
我試圖將測試網址「www.google.com」更改爲無效的網址,我得到了以下輸出。是否有可能避免「輸出競賽」?
http://www.google.co1m crawled by agent 1. AgAAAent gent 3 is done. gent 2 is done. 5 is done. gent 4 is done. Agent USupervisor RL collector is done. is done. 1 is done.
[編輯]
最後輸出/爬行仍終止使用托馬斯的更新後http://fssnip.net/65。以下是我將「限制」更改爲5並添加了一些調試消息後的程序輸出。最後一行顯示截斷的URL。它是一種檢測所有爬蟲是否完成其執行的方法?
[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n
我改變了主代碼
printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"
然而,最後printfn 「[主要]抓取後」從不執行,除非我添加到Console.ReadLine()在端。
[編輯2]
代碼在fsi下運行正常。但它會有相同的問題,如果它運行使用 fsi --use:Program.fs --exec --quiet
對我來說,這種使用郵箱處理器來解決URL的爬行過於複雜,URL的內容獲取也不是異步調用。使用簡單的異步計算可以輕鬆解決問題。 – Ankur