Untab documentation
  • Overview
  • Architecture
  • Compatibility
  • Collected information
  • Authentication
  • Setup
    • First steps
    • Installing using the basic manifest
    • Installing using Helm
    • Checking agent status
    • Configuring access to cloud providers (optional)
    • Setting up access on AWS
    • Setting up access on Google Cloud Platform
  • Management
    • Managing user access
    • Troubleshooting
Powered by GitBook
On this page
  • When applying the Untab manifest on my GKE cluster, Kubernetes shows an error like this: "clusterroles.rbac.authorization.k8s.io "untab-agent" is forbidden: attempt to grant extra privileges"
  • The agent pod fails to start
  • Some node-exporter pods fail to start
  • Cluster status shows error: "prometheus job kubernetes-nodes-node-exporter failed on X/Y nodes"
  • Cluster status shows error: "prometheus job kubernetes-nodes-kubelet failed on X/Y nodes"
  • Cluster status shows metrics warning "no recent data"
  • Something else

Was this helpful?

  1. Management

Troubleshooting

When applying the Untab manifest on my GKE cluster, Kubernetes shows an error like this: "clusterroles.rbac.authorization.k8s.io "untab-agent" is forbidden: attempt to grant extra privileges"

By default, GKE users do not have the permissions required to create the ClusterRole objects required for Untab. To resolve this issue, give your user cluster-admin permissions, replacing the names my-binding and user@example.org as appropriate:

kubectl create clusterrolebinding my-binding --clusterrole=cluster-admin --user=user@example.org

The agent pod fails to start

The agent pod includes three containers:

  • agent

  • prometheus

  • kube-state-metrics

First, determine which of these containers is failing. To do this, run:

kubectl get pod -n untab $(kubectl get pod -n untab -l app=untab-agent -o jsonpath="{.items[0].metadata.name}") -o go-template="{{range .status.containerStatuses}}{{.name}}: {{.lastState.terminated.reason}}{{\"\n\"}}{{end}}"

You should see output like this:

agent: <no value>
kube-state-metrics: <no value>
prometheus: Error

In the example above, we can see that it's the Prometheus container that is failing. Now, check the log output of that container to see what the problem might be:

kubectl logs -n untab $(kubectl get pod -n untab -l app=untab-agent -o jsonpath="{.items[0].metadata.name}") prometheus

Some node-exporter pods fail to start

If one or more node-exporter pods fail to start, check the status of the pods:

kubectl get pods -n untab

If you see the status of some pods as "Pending", this is most likely cause by a lack of available resources on those nodes. Each node needs at least 0.1 CPU cores and 200MiB of RAM available in order to run node-exporter.

If the pods are showing as "Error", then check the logs for one of the failing node-exporter instances:

kubectl logs -n untab node-exporter-xxxxx

Cluster status shows error: "prometheus job kubernetes-nodes-node-exporter failed on X/Y nodes"

First, check whether all instances of node-exporter are running successfully using the command kubectl get pods -n untab. You should see output like this:

NAME                          READY   STATUS    RESTARTS   AGE
node-exporter-gzxpj           2/2     Running   0          10s
node-exporter-jtqs6           2/2     Running   0          10s
node-exporter-pshll           2/2     Running   0          10s
...
untab-agent-fd5744b64-bgfjd   3/3     Running   0          10s

If all node exporters are running then this issue is most likely caused by firewall or security group rules. Please ensure that the master nodes in the cluster can access all nodes on port 9111.

You can get more information on the error by looking at the Prometheus console inside the agent. You can do this using the following steps:

  • Run: kubectl port-forward -n untab $(kubectl get pod -n untab -l app=untab-agent -o jsonpath="{.items[0].metadata.name}") 9090

  • You should see the status of all of the metric targets Untab collects, along with any encountered errors.

Cluster status shows error: "prometheus job kubernetes-nodes-kubelet failed on X/Y nodes"

The Prometheus instance inside the untab-agent pod collects metrics from the Kubelets on each node by querying the /metrics and /metrics/cadvisor endpoints. These queries are proxied through the API server.

This error indicates that Prometheus was unable to connect to the Kubelet on one or more nodes.

You can get more information on the error by looking at the Prometheus console inside the agent. You can do this using the following steps:

  • Run: kubectl port-forward -n untab $(kubectl get pod -n untab -l app=untab-agent -o jsonpath="{.items[0].metadata.name}") 9090

  • You should see the status of all of the metric targets Untab collects, along with any encountered errors.

Cluster status shows metrics warning "no recent data"

Once the agent is running, it can take up to 15 minutes for the first set of metrics to become available.

The first thing to check is whether the agent is running:

If the agent and node-exporter pods have been running for about 15 minutes but the "no recent data" message is still shown, it is possible that Prometheus is having trouble uploading the metrics to Untab. To check whether this is the case, take a look at the Prometheus logs using the following command:

kubectl logs -n untab $(kubectl get pod -n untab -l app=untab-agent -o jsonpath="{.items[0].metadata.name}") prometheus

Something else

PreviousManaging user access

Last updated 5 years ago

Was this helpful?

If you are unable to determine the cause or fix the problem, please and include the output from the two commands above.

If you are unable to determine the cause or fix the problem, please and include the output from the two commands above.

As part of the untab namespace, Untab deploys (unless ). This is an open-source tool that allows Untab to collect node-level utilization metrics. It runs as a DaemonSet, which means that there will be one Pod on each node.

If some node-exporter instances are not shown as "Running", please see the question "" above.

Open your browser at:

Open your browser at:

The agent via a Prometheus instance inside the agent Pod. Metrics are polled every 5 minutes and are uploaded to Untab's servers. The message "no recent data" indicates that no metrics were received in the last hour.

The exact solution will depend on the nature of the errors. In most cases, this is caused by outbound firewall rules. If your cluster uses a proxy to reach external URLs, you will need to use a customized manifest ().

If you are having an issue that is not covered in this guide, please .

contact us for support
contact us for support
node-exporter
set to use an existing set
Some node-exporter pods fail to start
http://localhost:9090/targets
http://localhost:9090/targets
collects metrics
Checking agent status
see here for instructions
contact us by email