[Server] Troubleshooting Complete Jobs Showing Incomplete

August 01, 2023 19:04

Problem Description

You are experiencing parallel jobs that are complete, but are not being updated as such. Their timers on the workflow page are continuing to increment and are holding back the rest of the workflow.

Solution

AWS by default will rebalance an ASG to maintain similar numbers of instances in multiple Availability Zones. See their documentation:

Availability Zone rebalancing

This rebalancing will hard stop a nomad client without any drain delay, killing all the actively running jobs and logs an activity on the ASG.

Please try running your clients in a single Availability Zone to limit the impact of these rebalancing events. If that succeeds, you may look into using multiple ASGs, one per AZ, to eliminate those rebalances.

Additional Resources

Amazon EC2 Auto Scaling benefits

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Articles in this section