Container Runner Job Stuck on Lifecycle Stage Due To Client-Side Throttling

Overview

Users may encounter client-side throttling issues with CircleCI's container runner on Kubernetes, particularly when the Kubernetes API server is heavily utilized by having a lot of resource in their helm values. This can result in jobs being stuck in the "Task lifecycle" stage. Users are likely to see error like the following in their container-agent logs.

waited for 3s due to client-side throttling, not priority and fairness, request:

Prerequisites

  • Access to the Kubernetes cluster where the CircleCI container runner is deployed.
  • Familiarity with Kubernetes and Helm configurations.
  • Ability to modify the values.yaml file for the container runner.

Solution

To address the client-side throttling issue, consider the following steps:

Increase Agent Replica Count: Distribute the API request load by increasing the number of container agent replicas. This can help prevent any single pod from reaching the throttling limits.

    1. Update your values.yaml file with the following configuration:
      agent:
      replicaCount: 2
    2. Deploy the change; This adjustment helps balance the API requests more effectively across multiple pods.
      helm upgrade container-agent container-agent/container-agent -n  -f values.yaml

Additional Resources

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.