2017-05-19 40 views
1

我有編織CNI插件由3個節點的kubernetes羣集:Kubelet方法具有在長時間高CPU使用率

  • 1主節點(虛擬機)
  • 2工人baremetall節點(4個核Xeon處理器超線程--8個邏輯節點)

問題是,top顯示kubelet在第一個worker上有60-100%的CPU使用率。 在journalctl -u kubelet我看到很多信息(數百每分鐘)

May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075243 3843 docker_sandbox.go:205] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640": Error response from daemon: {"message":"No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640"} 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075360 3843 remote_runtime.go:109] StopPodSandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075380 3843 kuberuntime_gc.go:138] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076549 3843 docker_sandbox.go:205] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf": Error response from daemon: {"message":"No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf"} 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076654 3843 remote_runtime.go:109] StopPodSandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076676 3843 kuberuntime_gc.go:138] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079585 3843 docker_sandbox.go:205] Failed to stop sandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772": Error response from daemon: {"message":"No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772"} 
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079805 3843 remote_runtime.go:109] StopPodSandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-r30cw_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772 

它發生之後,在創建過程中失敗的錯誤cronetes任務。我用--force刪除了所有的豆莢,但kubelet仍然嘗試刪除它們。此外,我重新啓動該工人kubelet沒有結果。我如何與kubelet談話忘記他們?

版本信息

Kubernetes v1.6.1 
Docker version 1.12.0, build 8eab29e 
Linux kube-worker1 4.4.0-72-generiC#93-Ubuntu SMP 

集裝箱艙單(不包括元數據)

job: 
    apiVersion: batch/v1 
    kind: Job 
    spec: 
     template: 
     spec: 
      containers: 
      - name: cron-task 
      image: docker.company.ru/image:v2.3.2 
      command: ["rake", "db:refresh_views"] 
      env: 
      - name: RAILS_ENV 
       value: namespace 
      - name: CONFIG_PATH 
       value: /config 
      volumeMounts: 
      - name: config 
       mountPath: /config 
      volumes: 
      - name: config 
      configMap: 
       name: task-conf 
      restartPolicy: Never 

也沒發現羣集的ETCD名(2533948c46c1)的這一吊艙的一部分的任何提及。

回答

0

最後我找到了解決方案。
Kubelet存儲有關的所有信息豆莢,在

/var/lib/dockershim/sandbox 

在其上運行所以,當我ls該文件夾中,我發現文件中所有失蹤吊艙。然後我刪除這些文件和日誌消息消失,並且CPU使用率恢復到正常值(即使沒有kubelet重新啓動)

0

這似乎與Kubernetes 1.6.x中的Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI問題有關。這些消息無論如何不是關鍵,但當你試圖找到實際問題時,它當然很煩人。 嘗試使用最新版本的Kubernetes來緩解問題。

+0

謝謝你的回答!但它看起來是一個不同的問題。我沒有指定網絡類型,所以我假設hostNetwork = false。我對嗎? 整個日誌包含3種消息:1)StopPodSandbox,2)停止沙箱失敗,3)MountVolume.SetUp成功。我應該提供任何其他信息嗎? – user1802474

0

我遇到了同樣的問題,並且確實爲此進行了分析並找到原因是kubelet pleg機制並刪除'/ var/lib/dockershim/sandbox'做了魔術。