2017-09-02 33 views
0

我遇到cAdvisor問題,在查詢指標端點時,並非所有指標都可靠地返回。具體而言,通過Prometheus查詢container_fs_limit_bytes{device=~"^/dev/.*$",id="/",kubernetes_io_hostname=~"^.*"}經常只顯示我的Kubernetes集羣中一小部分節點的結果。這種情況發生在相應指標未超過5分鐘(由於度量標準變爲stale)時,但我不確定爲什麼每次成功查詢端點時都沒有顯示所有指標。Kubelet的cAdvisor指標端點不能可靠地返回所有指標

一次又一次地捲起端點顯示某些度量僅在特定時間返回,因此上述普羅米修斯查詢將返回所有節點的數據,只有在最後5分鐘內發生一次刮擦時,不是這樣的。

一種解決方法是在超過5分鐘的較長時間段內取平均值,但這並不理想。

kubectl版本:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} 
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} 

普羅米修斯版本:1.7.1

普羅米修斯配置:

global: 
    scrape_interval: 15s 
    scrape_timeout: 10s 
    evaluation_interval: 1m 
alerting: 
    alertmanagers: 
    - static_configs: 
    - targets: 
     - alertmanager:9093 
    scheme: http 
    timeout: 10s 
rule_files: 
- /etc/prometheus-rules/alert.rules 
scrape_configs: 
- job_name: kubernetes-nodes 
    scrape_interval: 15s 
    scrape_timeout: 10s 
    metrics_path: /metrics 
    scheme: https 
    kubernetes_sd_configs: 
    - api_server: null 
    role: node 
    namespaces: 
     names: [] 
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 
    tls_config: 
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 
    insecure_skip_verify: false 
    relabel_configs: 
    - source_labels: [] 
    separator: ; 
    regex: __meta_kubernetes_node_label_(.+) 
    replacement: $1 
    action: labelmap 
    - source_labels: [] 
    separator: ; 
    regex: (.*) 
    target_label: __address__ 
    replacement: kubernetes.default.svc:443 
    action: replace 
    - source_labels: [__meta_kubernetes_node_name] 
    separator: ; 
    regex: (.+) 
    target_label: __metrics_path__ 
    replacement: /api/v1/nodes/${1}:4194/proxy/metrics 
    action: replace 
    metric_relabel_configs: 
    - source_labels: [id] 
    separator: ; 
    regex: ^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$ 
    target_label: rkt_container_name 
    replacement: ${2}-${1} 
    action: replace 
    - source_labels: [id] 
    separator: ; 
    regex: ^/system\.slice/(.+)\.service$ 
    target_label: systemd_service_name 
    replacement: ${1} 
    action: replace 

回答

2

這是cAdvisor如何使用普羅米修斯客戶端庫一known bug