I ran into a problem where Helm deployments were hitting a timeout error. The Linkerd cluster monitoring dashboard showed resource utilization well below our limits.

Linkerd Kubernetes cluster monitoring Grafana dashboard (via Prometheus) 

From here I wondered if Deployment resource requests were above normal. The Kubernetes scheduler relies on resource requests to determine if there are enough resources to add new workloads. If the resource requests are too high the scheduler will not schedule new containers.

I googled for an easy way to find the total number of requests of the cluster. I found this article that referenced this issue on Github. And ran the following command.

$ kubectl get nodes --no-headers | awk '{print $1}' | xargs -I {} sh -c 'echo {}; kubectl describe node {} | grep Allocated -A 5 | grep -ve Event -ve Allocated -ve percent -ve -- ; echo'


gke-current-cluster-node-name-a9df8
  Resource                   Requests      Limits
  cpu                        7273m (91%)   15260m (192%)
  memory                     7615Mi (28%)  15373Mi (57%)

gke-current-cluster-node-name-afdf8
  Resource                   Requests         Limits
  cpu                        7318m (92%)      15210m (192%)
  memory                     7723784Ki (28%)  15411976Ki (56%)

gke-current-cluster-node-name-a6df8
  Resource                   Requests         Limits
  cpu                        7278m (92%)      15080m (190%)
  memory                     7896840Ki (29%)  15544072Ki (57%)


Success! My hypothesis was confirmed. CPU requests were causing the scheduler to pause. From here I adjusted the CPU requests based on what I observed using kubectl top pod and Helm install the new Deployments without a timeout error.