2011-08-04 58 views
4

我設立一個RabbitMQ的集羣,並在過程中的一個步驟中遇到了一個問題。它直接出自rabbitmq集羣指南。的Mnesia無法連接到另一個節點

[email protected]:~# rabbitmqctl status 
Status of node [email protected] ... 
[{pid,20410}, 
{running_applications,[{rabbit,"RabbitMQ","2.5.1"}, 
         {os_mon,"CPO CXC 138 46","2.2.4"}, 
         {sasl,"SASL CXC 138 11","2.1.8"}, 
         {mnesia,"MNESIA CXC 138 12","4.4.12"}, 
         {stdlib,"ERTS CXC 138 10","1.16.4"}, 
         {kernel,"ERTS CXC 138 10","2.13.4"}]}, 
{os,{unix,linux}}, 
{erlang_version,"Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [hipe] [kernel-poll:true]\n"}, 
{memory,[{total,25296704}, 
      {processes,9680280}, 
      {processes_used,9662720}, 
      {system,15616424}, 
      {atom,1099393}, 
      {atom_used,1082732}, 
      {binary,89768}, 
      {code,11606637}, 
      {ets,726848}]}] 
...done. 
[email protected]:~# rabbitmqctl cluster_status 
Cluster status of node [email protected] ... 
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}] 
...done. 
[email protected]:~# rabbitmqctl stop_app 
Stopping node [email protected] ... 
...done. 
[email protected]:~# rabbitmqctl reset 
Resetting node [email protected] ... 
...done. 
[email protected]:~# rabbitmqctl cluster [email protected] 
Clustering node [email protected] with [[email protected]] ... 
Error: {failed_to_cluster_with,[[email protected]], 
           "Mnesia could not connect to some nodes."} 

一個節點無法連接到另一個節點的可能原因是什麼?

這裏是我下面的指南:http://www.rabbitmq.com/clustering.html

回答

5

我跳進freenode上的#rabbitmq通道。下面是隨後進行的討論:

14:29 shakakai: hey all, i'm having a little issue with clustering rabbitmq http://stackoverflow.com/questions/6948624/mnesia-cant-connect-to-another-node 
14:30 shakakai: has anyone run into that problem before? 
14:30 daysmen has left IRC (Read error: Connection reset by peer) 
14:30 antares_: shakakai: make sure that epmd is running on every node 
14:30 antares_: shakakai: and that port it uses (4369) is open in your firewall 
14:31 |Blaze|: shakakai: is your dns correct? Can you ping worker1 from celery and celery from worker1 
14:31 shakakai: |Blaze|: hmm...i'll check 
14:31 daysmen has joined ([email protected]) 
14:32 shakakai: |Blaze|: this is where I'm a little confused, the rabbitmq nodename is [email protected] but the fqdn to ping the box is "ping worker1.mydomain.com" 
14:33 |Blaze|: can you "ping worker1" 
14:34 shakakai: |Blaze|: no 
14:34 |Blaze|: k, you'll need to fix that 
14:34 hyperboreean has left IRC (Ping timeout: 250 seconds) 
14:37 shakakai: |Blaze|: gotcha, so I setup a hosts file and i should be good 
14:37 |Blaze|: yup 
14:37 |Blaze|: in both directions 

TL; DR

確保你可以從每個你聚類箱平安兔節點名。如果你不能,爲每個兔子節點名設置一個hosts文件。

+0

我不認爲這是禁忌接受你自己的答案,特別是因爲它是一個很好的一個。 – scvalex

+0

哎呦 - 忘了那個:P – Shakakai

0

有幾件事情來檢查,然後才能獲得集羣運行良好: 0)確保您設置了網絡中的每個節點 1)上運行完全相同的RabbitMQ的版本,直到你能夠從Ping通服務器對方 2)餅乾 - 你必須得到在.erlang.cookie文件完全相同的二郎神的cookie每個服務器 一個竅門是有用的是,在一個節點試試這個命令來查看是否可以從RabbitMQ的 達到一個又一個rabbitmqctl eval'net_adm:ping(rabbit @ othernode)'。'

這應該說龐如果是nok或pong如果沒關係 注意不要忘記接近eval表達式結尾的點。

我得到了它幾個小時不成功的試驗後工作正常。

3)請記住,重新啓動羣集的節點時,如果該節點是不是最後一次,這是停在那裏可能是一個問題 - 它不會前的最後一站是重新開始啓動。 當所有上述(0-2)是正確的,3可能是你的問題的根本原因......

希望這有助於, 歡呼, JB

-1

有一件事我讀過的erlang cookie需要在所有羣集節點上進行通信。我相信它生活在/var/lib/rabbitmq/.erlang.cookie

相關問題