Troubleshooting Common Errors in Runner Logs

Overview

While monitoring your Runner logs, certain HTTP status codes can indicate issues that require attention. This guide will help you identify and troubleshoot common error patterns.

Key Error Codes to Monitor

When reviewing your Runner logs, pay special attention to these HTTP status codes:

  • 500: Internal Server Error
  • 404: Not Found
  • 403: Forbidden
  • 429: Too Many Requests

These status codes typically signal underlying problems that may affect your workflows.

Common Error Scenarios and Solutions

Network Connectivity Issues (404, 403)

If you see 404 or 403 errors, especially intermittently, this may indicate network connectivity problems with the container agent.

Troubleshooting Steps:

  1. Connect to the pod using:
    kubectl exec --stdin --tty -n circleci <full pod name> -- /bin/sh
  2. Run an extended ping test to verify network stability
  3. Check for any network policies or firewall rules that might be blocking connections

Resource Exhaustion (429)

Status code 429 often indicates that your pods are reaching their resource limits in the cluster. When this happens, the pod might terminate jobs to free up resources.

Troubleshooting Steps:

  1. Review your current resource allocation
  2. Adjust resource limits either in:
    • Your values.yaml file, or
    • Your config.yaml file
  3. Consider scaling your cluster if you consistently hit resource limits

Task Agent Not Running (500)

When you encounter 500 errors, the task pod may be failing due to a failing liveness probe.

Troubleshooting Steps:

  1. Check the task agent status and logs
  2. Adjust the liveness probe defaults in the values.yaml for the container runner's Helm chart
  3. Verify that the container has sufficient resources to start and run properly

General Troubleshooting Approach

While the above scenarios cover common issues, you may encounter other error codes that fit similar patterns. When troubleshooting any error in Runner logs:

  1. Identify the specific error code
  2. Check for patterns (is the error consistent or intermittent?)
  3. Review relevant logs surrounding the error timestamp
  4. Apply the appropriate troubleshooting steps based on error characteristics
  5. Document your findings to help with future troubleshooting

Remember that this is not an exhaustive list of all possible errors, but these guidelines should help you address the most common issues you'll encounter in Runner logs.

Additional Resources

 
Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.