2017-06-15 109 views
0

我一直在嘗試設置Kubernetes集羣幾個月,但目前爲止我還沒有運氣。Kube-proxy無法檢索節點信息 - 無效nodeIP

我試圖將其設置爲4 裸機運行的電腦coreOS。我只是重新安裝了所有東西,但我遇到了和以前一樣的問題。我正在關注this教程。我想我已經正確配置了一切,但我不是100%確定的。當我重新啓動任何一臺機器,kubelet和flanneld服務正在運行,但我看到下面的錯誤爲他們檢查服務狀態時systemctl status

kubelet錯誤:Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)

flanneld錯誤Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)

如果我重新啓動這兩個服務,他們工作,或至少看起來像他們工作 - 我沒有得到任何錯誤。

其他一切似乎工作正常,所以唯一的問題(我認爲)剩下的就是所有節點上的kube-proxy服務。

如果我跑kubectl get pods我看到所有豆莢運行:

$ kubectl get pods 
NAME         READY  STATUS RESTARTS AGE 
kube-apiserver-kubernetes-4   1/1  Running 4   6m 
kube-controller-manager-kubernetes-4 1/1  Running 6   6m 
kube-proxy-kubernetes-1    1/1  Running 4   18h 
kube-proxy-kubernetes-2    1/1  Running 5   26m 
kube-proxy-kubernetes-3    1/1  Running 4   19m 
kube-proxy-kubernetes-4    1/1  Running 4   18h 
kube-scheduler-kubernetes-4   1/1  Running 6   18h 

The answer to this question建議檢查是否已註冊上kubelet kubectl get node回報率相同的名稱。至於我查了日誌,節點正確註冊,這也是kubectl get node

$ kubectl get node 
NAME   STATUS      AGE  VERSION 
kubernetes-1 Ready       18h  v1.6.1+coreos.0 
kubernetes-2 Ready       36m  v1.6.1+coreos.0 
kubernetes-3 Ready       29m  v1.6.1+coreos.0 
kubernetes-4 Ready,SchedulingDisabled  18h  v1.6.1+coreos.0 

我用(上面鏈接)的教程建議我用--hostname-override輸出,但我不能在獲得節點信息主節點(kubernetes-4)如果我試圖在本地捲曲它。所以我刪除了它,現在我可以正常獲取節點信息。

有人建議它可能是一個法蘭絨問題,我應該檢查法蘭絨端口。使用netstat -lntu我得到以下輸出:

Active Internet connections (only servers) 
Proto Recv-Q Send-Q Local Address   Foreign Address   State  
tcp  0  0 127.0.0.1:10248   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:10249   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:2379   0.0.0.0:*    LISTEN  
tcp  0  0 MASTER_IP:2379   0.0.0.0:*    LISTEN  
tcp  0  0 MASTER_IP:2380   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:8080   0.0.0.0:*    LISTEN  
tcp6  0  0 :::4194     :::*     LISTEN  
tcp6  0  0 :::10250    :::*     LISTEN  
tcp6  0  0 :::10251    :::*     LISTEN  
tcp6  0  0 :::10252    :::*     LISTEN  
tcp6  0  0 :::10255    :::*     LISTEN  
tcp6  0  0 :::22     :::*     LISTEN  
tcp6  0  0 :::443     :::*     LISTEN  
udp  0  0 0.0.0.0:8472   0.0.0.0:*      

所以我假定端口是罰款?

而且etcd2作品,etcdctl cluster-health表明,所有節點都是健康

這是重新啓動時啓動etcd2,除此之外雲配置的一部分,我只存儲SSH密鑰和節點的用戶名/密碼/組它:

#cloud-config 

coreos: 
    etcd2: 
    name: "kubernetes-4" 
    initial-advertise-peer-urls: "http://NODE_IP:2380" 
    listen-peer-urls: "http://NODE_IP:2380" 
    listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379" 
    advertise-client-urls: "http://NODE_IP:2379" 
    initial-cluster-token: "etcd-cluster-1" 
    initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380" 
    initial-cluster-state: "new" 
    units: 
    - name: etcd2.service 
     command: start 

這是/etc/flannel/options.env文件的內容:

FLANNELD_IFACE=NODE_IP 
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379 

相同的端點是下在kube-apiserver.yaml文件

任何想法/建議可能是什麼問題?此外,如果有一些細節想讓我知道,我會將它們添加到帖子中。

編輯:我忘了包含kube-proxy日誌。

主節點KUBE-代理日誌:

$ kubectl logs kube-proxy-kubernetes-4 
I0615 07:47:45.250631  1 server.go:225] Using iptables Proxier. 
W0615 07:47:45.286923  1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused 
W0615 07:47:45.303576  1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP 
W0615 07:47:45.303593  1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic 
I0615 07:47:45.303646  1 server.go:249] Tearing down userspace rules. 
E0615 07:47:45.357276  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused 
E0615 07:47:45.357278  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused 

工作節點KUBE-代理日誌:

$ kubectl logs kube-proxy-kubernetes-1 
I0615 07:47:33.667025  1 server.go:225] Using iptables Proxier. 
W0615 07:47:33.697387  1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused 
W0615 07:47:33.712718  1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP 
W0615 07:47:33.712734  1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic 
I0615 07:47:33.712773  1 server.go:249] Tearing down userspace rules. 
E0615 07:47:33.787122  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused 
E0615 07:47:33.787144  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused 

回答

0

你嘗試腳本here?這些是您使用的教程的精簡版本,適用於各種平臺。這些腳本完全適用於k8s v1.6.4的裸機。我有一個tweaked script更好的加密。

kube-apiserver未運行,這說明錯誤dial tcp 127.0.0.1:8080: getsockopt: connection refused。當我調試kube-apiserver時,這是我在節點上做的事情:

  1. 刪除/etc/kubernetes/manifests/kube-apiserver.yaml
  2. 手動運行一個hyperkube容器。根據您的配置,您將不得不安裝額外的卷(即-v)以將文件公開到容器。將圖像版本更新爲您使用的版本。

    docker run --net=host -it -v /etc/kubernetes/ssl:/etc/kubernetes/ssl quay.io/coreos/hyperkube:v1.6.2_coreos.0

  3. 上述命令將在hyperkube容器推出的殼。現在,使用kube-apiserver.yaml清單中的標誌啓動kube-apiserver。它應類似於此示例:

    /hyperkube apiserver \ --bind-address=0.0.0.0 \ --etcd-cafile=/etc/kubernetes/ssl/apiserver/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/apiserver/client.pem \ --etcd-keyfile=/etc/kubernetes/ssl/apiserver/client-key.pem \ --etcd-servers=https://10.246.40.20:2379,https://10.246.40.21:2379,https://10.246.40.22:2379 \ ...

在任何情況下,我建議你拆掉集羣,首先嚐試的腳本。它可能只是工作ootb。