Overview
When you upgrade the CircleCI server version, you may find that old pods are not terminated cleanly and continue running alongside the newly deployed ones, causing services to fail. If you check the pod status and see the similar result like below, this is the case.
$ kubectl get pods -n <namespace> NAME READY STATUS RESTARTS AGE api-service-84894f748c-5bw7z 0/1 CrashLoopBackOff 9 (59s ago) 54m api-service-f45d8b7c4-2fbc9 0/1 CrashLoopBackOff 9 (96s ago) 54m audit-log-service-5857698db4-qtghq 0/1 CrashLoopBackOff 9 (43s ago) 54m audit-log-service-5b6b5dcb9c-qskzd 0/1 CrashLoopBackOff 9 (86s ago) 54m
Check if a pod is failing to connect to RabbitMQ
Check the logs from a duplicated pod first. Let's find the logs from the API service pod as the following example. You can find the connection refused error when trying to connect to RabbitMQ.
$ kubectl logs <pod_name> -n <namespace> 2025-10-07T08:33:07.689+0000 [] [main] ERROR circleci.backplane.trace backplane.rabbitmq/connect; attempt=15; canary=false; deploy_environment=production; duration_ms=30014.716913; exception.message=Connection refused; exception.type=class java.net.ConnectException; hostname=api-service-f45d8b7c4-2fbc9; k8s_pod_name=api-service-f45d8b7c4-2fbc9; k8s_pod_namespace=circleci-server-cj; k8s_replicaset=api-service-f45d8b7c4; meta.location=circleci.backplane.rabbitmq:99; revision=0b28f56bd4d926218074253c7d218887d6f4086d; service=api-service; span_kind=internal; status_code=2; status_desc=retries exceeded; version=1.0.23610 ... 2025-10-07T08:33:07.691+0000 [] [main] ERROR circleci.backplane.exceptions Exiting due to uncaught exception; ... java.net.ConnectException: Connection refused ... at circleci.backplane.rabbitmq$connect.invokeStatic(rabbitmq.clj:99)
Check if the RabbitMQ pod is pending
Next, let's look at the events from the RabbitMQ pod. The pod is pending and and hasn't been assigned to any node because of insufficient cpu resources in the cluster.
$ kubectl describe pod <pod_name>
Name: rabbitmq-0
Namespace: circleci-server
...
Status: Pending
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 67m (x11 over 75m) default-scheduler 0/4 nodes are available: 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Warning FailedScheduling 56m default-scheduler no nodes available to schedule pods
Warning FailedScheduling 43m (x79 over 56m) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 43m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 3 Insufficient cpu. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod.
Warning FailedScheduling 43m default-scheduler 0/4 nodes are available: 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Warning FailedScheduling 30m (x11 over 43m) default-scheduler 0/4 nodes are available: 4 Insufficient cpu. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Solution
This can be resolved either by adding more nodes to the cluster or resizing existing nodes.
Comments
Please sign in to leave a comment.