2014-04-16 177 views
3

後,我在AWS EC2創建跨2個節點的集羣和複製的文件系統使用以下鏈接作爲指導:GlusterFS服務器不會啓動重新啓動

http://www.gluster.org/category/aws-en/

  • 我使用2個節點AWS EC2
  • 我使用一個Ubuntu 13.10(俏皮)
  • 已經安裝了PPA glusterfs服務器:指號/ Ubuntu的glusterfs-3.4回購

它安裝和配置非常簡單,而且效果很好 - 直到我重新啓動任何節點。一旦我設置了所有這一切,我重新啓動一個節點只是爲了驗證一切恢復,但它永遠不會。它僅在安裝和配置後才能使用,無需重新引導。一旦我重新啓動glusterfs服務器將無法啓動,我必須重新創建整個實例。

我倒過日誌在/ var/log/glusterfs,在前臺模式下運行glusterd等,我沒有得到任何答案,跳出我。顯示錯誤,但Google沒有太多幫助。下面是運行glusterd在前臺輸出:

[email protected]:/var/log/glusterfs# /usr/sbin/glusterd -N -p /var/run/glusterd.pid 
librdmacm: couldn't read ABI version. 
librdmacm: assuming: 4 
CMA: unable to get RDMA device list 

錯誤日誌捕捉掙扎的啓動,在關閉最終結束,但我一直沒能確定一個原因或解決方案:

[2014-04-16 19:58:09.925937] E [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore 
[2014-04-16 19:58:09.925968] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again 
[2014-04-16 19:58:09.926003] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed 
[2014-04-16 19:58:09.926019] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed 
[2014-04-16 19:58:09.926392] W [glusterfsd.c:1002:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x3df) [0x7f801961d8df] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7f80196206e0] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x7f80196205f3]))) 0-: received signum (0), shutting down 
[2014-04-16 20:40:20.992287] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.3 (/usr/sbin/glusterd -N -p /var/run/glusterd.pid) 
[2014-04-16 20:40:20.996223] I [glusterd.c:961:init] 0-management: Using /var/lib/glusterd as working directory 
[2014-04-16 20:40:20.997685] I [socket.c:3480:socket_init] 0-socket.management: SSL support is NOT enabled 
[2014-04-16 20:40:20.997713] I [socket.c:3495:socket_init] 0-socket.management: using system polling thread 
[2014-04-16 20:40:20.999231] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) 
[2014-04-16 20:40:20.999268] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device 
[2014-04-16 20:40:20.999284] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed 
[2014-04-16 20:40:20.999435] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed 
[2014-04-16 20:40:23.858537] I [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2 
[2014-04-16 20:40:23.869829] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 
[2014-04-16 20:40:23.869880] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 
[2014-04-16 20:40:25.611295] E [glusterd-utils.c:4990:glusterd_friend_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known 
[2014-04-16 20:40:25.612154] E [glusterd-utils.c:284:glusterd_is_local_addr] 0-management: error in getaddrinfo: Name or service not known 
[2014-04-16 20:40:25.612190] E [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore 
[2014-04-16 20:40:25.612221] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again 
[2014-04-16 20:40:25.612239] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed 
[2014-04-16 20:40:25.612254] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed 
[2014-04-16 20:40:25.612628] W [glusterfsd.c:1002:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x3df) [0x7fef3d7c58df] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7fef3d7c86e0] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x7fef3d7c85f3]))) 0-: received signum (0), shutting down 

我發現一個線程匹配了該gluster的用戶列表上,但它去解決:

http://www.gluster.org/pipermail/gluster-users/2013-October/037687.html

如果任何人都可以提供任何W isdom - 這將是非常感謝。

回答

0

爲了將來的參考 - 我沒有使用對等連接的完全限定的域名。我只使用主機名,並修改了/etc/resolv.conf來搜索我們的DNS後綴。重新啓動後,resolv.conf由DHCP客戶端重寫 - 從而打破了對等方的DNS解析。顯然,如果DNS名稱不能解決所有的服務甚至不會啓動 - 這可能被認爲是一個錯誤。我認爲服務應該始終開始。

1

嘗試停止卷:

gluster volume stop <volume name> 

,隨後與「力」重新啓動命令重建在每個磚基礎的元數據:

gluster volume start <volume name> force 
+1

不能發出gluster命令,如果服務如同OP的情況一樣。 如果DNS更改,可以手動替換DNS或IP信息,該Gluster的元數據中,但我強烈建議您先備份到/ var/lib中/ glusterfs目錄: https://www.gluster.org/pipermail/gluster-用戶/ 2015年六月/ 022264.html – DevOops