Troubleshooting Self-Hosted Runners (Machine Runner & Container Runner)

Overview

This article is a central troubleshooting reference for both runner types:

  • Machine Runner 3.x — an agent installed directly on a VM or physical machine (Linux, macOS, Windows)
  • Container Runner — a Helm-deployed agent that schedules jobs as pods in a Kubernetes cluster

If you are still using Launch Agent 1.x, stop here and migrate first — see Issue 4: Launch Agent 1.x jobs are failing (EOL) below.


Quick Pre-Checks

Before diving into specific issues, confirm the following:

CheckHow
Runner is registered and visibleOrg Settings → Self-Hosted Runners → confirm resource class appears and shows a runner
Runner versioncircleci-runner --version (machine runner) or helm list -n <namespace> (container runner)
Resource class name in config matches exactlyNames are case-sensitivemy-org/my-runner ≠ my-org/My-Runner
Runner has outbound internet access to runner.circleci.comPort 443 required
Runner token is valid and not rotatedIf token was recently rotated, restart the runner process with the new token

Issue 1: "We cannot run this job using the selected resource class"

Symptom: The job fails immediately with:

We cannot run this job using the selected resource class.

Cause A — Resource class does not exist

Verify the resource class was created:

circleci runner resource-class list <your-namespace>

If missing, create it:

circleci runner resource-class create <your-namespace>/<resource-class-name> "description"

Cause B — Runner is not enabled for your plan

Self-hosted runners require a Scale, Custom, or Server plan. Performance and Free plans do not have access. Check at Org Settings → Plan.

Cause C — Typo in config.yml

The resource class in your config must exactly match what was created. Check for capitalization differences, leading/trailing spaces, or namespace mismatches:

# Must match the registered resource class exactly
resource_class: my-org/my-runner-name

Issue 2: Jobs Queued or Stuck in "Not Running" / "Preparing Environment"

Check 1 — Confirm at least one runner is online

Go to Org Settings → Self-Hosted Runners. If the resource class shows "No runners" or all runners appear offline, the runner process has stopped or lost connectivity.

Check 2 — Review maxConcurrentTasks

Each resource class has a maxConcurrentTasks limit (default: 20). If this limit is reached, additional jobs queue even if runner machines appear idle. Contact CircleCI Support to request an increase.

Check 3 — Inspect runner logs

See Runner Log File Locations below. Look for:

  • failed to claim task — runner cannot reach the CircleCI backend
  • context deadline exceeded — network timeout to runner.circleci.com
  • token is invalid — runner token was rotated; restart the runner with the new token

Check 4 — For container runner, check pod status

kubectl get pods -n <namespace>

kubectl logs deployment/container-agent -n <namespace>

If the container-agent pod is not in Running state, see Issues 5 and 6 below.


Issue 3: Runner Appears Online but Jobs Are Not Being Claimed

Cause A — Runner is at maxConcurrentTasks capacity

If a previous batch of jobs did not release cleanly (e.g., machine rebooted mid-job), tasks may still be counted as active in the backend. Contact Support to clear stuck task claims.

Cause B — Runner cannot reach the task assignment endpoint

The runner must be able to reach:

  • runner.circleci.com:443
  • *.circle-artifacts.com (for artifact and cache operations)

Test from the runner machine:

curl -I https://runner.circleci.com/api/v3/runner/unclaim

Cause C — Clock skew on the runner machine

TLS certificate validation requires the system clock to be within a few minutes of actual time. If the clock is skewed, authentication will fail silently. Verify NTP is configured and the clock is accurate (timedatectl on Linux).


Issue 4: Launch Agent 1.x Jobs Are Failing (EOL)

Support for Launch Agent 1.x ended on September 17, 2024. Any runner still running a 1.x version will fail.

Symptoms:

  • Jobs fail immediately with no useful error in the job output
  • Runner logs show authentication or connection errors with no clear cause

Action required: Migrate to Machine Runner 3.x

The migration is straightforward — the configuration file is 1:1 compatible. No config changes are required.

# macOS (Homebrew)
brew install circleci-runner

# Linux (Debian/Ubuntu)
apt install circleci-runner

# Linux (RHEL/CentOS)
yum install circleci-runner

After installing, your existing config file (launch-agent-config.yaml) works without modification:

circleci-runner start --config launch-agent-config.yaml

Full migration docs: https://circleci.com/docs/guides/execution-runner/migrate-from-launch-agent-to-machine-runner-3-on-linux/


Issue 5: Container Runner — Jobs Stuck in "Task Lifecycle" Stage (K8s Throttling)

Symptom: Jobs hang in the "Task lifecycle" stage. Container-agent logs show:

waited for 3s due to client-side throttling, not priority and fairness, request: ...

Cause: The single container-agent pod is saturating the Kubernetes API rate limits under high task concurrency.

Fix: Increase the replica count in values.yaml:

agent:
replicaCount: 2

Apply the change:

helm upgrade container-agent container-agent/container-agent -n <namespace> -f values.yaml

Issue 6: Container Runner — Pods Remain in "Pending" State

CauseHow to check
Node out of memory (OOM)kubectl describe node <node-name> — look for MemoryPressure: True
Node disk pressurekubectl describe node <node-name> — look for DiskPressure: True
No nodes match pod affinity/tolerationskubectl describe pod <task-pod-name> -n <namespace> — look for Unschedulable events
Image pull failurekubectl describe pod <task-pod-name> — look for ImagePullBackOff or ErrImagePull

For image pull issues with a private registry, see How to use imagePullSecrets on Container Runner.


Issue 7: OIDC Tokens Not Available in Runner Jobs

Symptom: $CIRCLE_OIDC_TOKEN is empty or the job fails when trying to use it.

Cause: OIDC token generation writes a file to /tmp. If /tmp is mounted with the noexec flag (common in hardened environments), this fails silently.

Diagnose:

mount | grep /tmp
# Look for "noexec" in the output

Fix options:

  1. Remove the noexec flag from /tmp if your security policy permits.
  2. Configure the runner to use an alternative working directory that allows execution.
  3. Use a native credential mechanism (AWS IAM instance profiles, GCP Workload Identity) instead of OIDC on that runner.

Issue 8: "fork/exec /bin/bash: bad file descriptor" (Container Runner)

Symptom:

failed to start cmd: fork/exec /bin/bash: bad file descriptor

Cause: The job's Docker image does not have /bin/bash, or the image entrypoint conflicts with the runner's task agent.

Fix:

  1. Ensure the image includes bash (RUN apt-get install -y bash), or use an image that includes it.
  2. Explicitly set the shell in your job config:
jobs:
  my-job:
    shell: /bin/sh -eo pipefail

Issue 9: SSH Debugging Not Working on Self-Hosted Runners

Container Runner does not support SSH debugging. This is a current product limitation — "Rerun job with SSH" is not available for container runner jobs.

Machine Runner does support SSH reruns. If it's not working, verify:

  • Project Settings → Advanced → Enable SSH reruns is turned on
  • The runner machine is network-accessible from your IP on the SSH port

Runner Log File Locations

Machine Runner 3.x

OSLog location
Linux (systemd)journalctl -u circleci-runner -f
Linux (file)/var/log/circleci-runner/circleci-runner.log
macOS~/Library/Logs/com.circleci.runner/circleci-runner.log
WindowsC:\ProgramData\CircleCI\circleci-runner.log

To increase log verbosity, set log_level: debug in the runner config file and restart the service.

Container Runner

# Container agent logs
kubectl logs deployment/container-agent -n <namespace> --tail=200

# Logs for a specific task pod
kubectl logs <task-pod-name> -n <namespace>

# Events (most useful for Pending pods)
kubectl describe pod <task-pod-name> -n <namespace>

When Escalating to Support

Include the following in your ticket to avoid back-and-forth:

  • Runner type: Machine Runner or Container Runner
  • Runner version:circleci-runner --versionor Helm chart version (helm list -n <namespace>)
  • Resource class name exactly as it appears inconfig.yml
  • OS and version (machine runner) or Kubernetes version and cloud provider (container runner)
  • Runner logs from the time window of the failure
  • The specific failing job URL fromapp.circleci.com
  • Output ofcircleci runner resource-class list <namespace>
  • Whether the issue is intermittent or consistent

Additional Resources


 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.