Container Runner Job Stuck on Lifecycle Stage Due To Client-Side Throttling

Overview

Users may encounter client-side throttling issues with CircleCI's container runner on Kubernetes, particularly when the Kubernetes API server is heavily utilized by having a lot of resource in their helm values. This can result in jobs being stuck in the "Task lifecycle" stage. Users are likely to see error like the following in their container-agent logs.

waited for 3s due to client-side throttling, not priority and fairness, request:

Prerequisites

Access to the Kubernetes cluster where the CircleCI container runner is deployed.
Familiarity with Kubernetes and Helm configurations.
Ability to modify the values.yaml file for the container runner.

Solution

To address the client-side throttling issue, consider the following steps:

Increase Agent Replica Count: Distribute the API request load by increasing the number of container agent replicas. This can help prevent any single pod from reaching the throttling limits.

1. Update your values.yaml file with the following configuration:
```
agent:
  replicaCount: 2
```
2. Deploy the change; This adjustment helps balance the API requests more effectively across multiple pods.
```
helm upgrade container-agent container-agent/container-agent -n  -f values.yaml
```

Additional Resources

Looking for a broader runner troubleshooting guide? This article covers one specific error. For a complete guide covering resource class errors, jobs stuck in "Not Running", Launch Agent EOL, container runner issues, log locations, and more, see: Troubleshooting Self-Hosted Runners (Machine Runner & Container Runner)

Prerequisites

Solution

Additional Resources

Comments

Articles in this section