How to detect when a process is killed by the OOM killer

Introduction:

When processes running in a job consume an amount memory near the limit provided by the resource class being used, they can be killed by the host machine.

It is possible to detect when a process was terminated by monitoring several files within the /sys/fs/cgroup/memory/ directory.

Prerequisites:

Please ensure that you are using the Docker executor, and that you already have a working config.yml file committed to your repository.

Instructions:

  1. Step 1 - Create a new run step at the beginning of your job
  2. Step 2 - Set background:true to run the step continuously
  3. Step 3 - Add the script under command in the run step

    • - run:
          command: |
            # Loop 100 times to monitor memory usage
            for i in {1..100}; do
              # Display the current memory usage in bytes
              echo -n "Memory Usage: "
              cat /sys/fs/cgroup/memory/memory.usage_in_bytes
              
              # Display the memory limit in bytes
              echo -n "Memory Limit: "
              cat /sys/fs/cgroup/memory/memory.limit_in_bytes
              
              # Display the Out-Of-Memory control settings
              echo -n "OOM Control: "
              #cat /sys/fs/cgroup/memory/memory.oom_control
              cat /sys/fs/cgroup/memory/memory.oom_control | sed -n 3p
              
              # Display the current date and time
              echo -n "Current Date: "
              date
              
              # Sleep for 1 second before the next iteration
              sleep 1
              
              # Add a separator for better readability
              echo "-------------------------"
            done
          background: true
  4. Step 4 - Adjust lines 2 and 21 to set how long you want the script to run
    • By default, the script will loop 100 times, each iteration for one second.

Outcome:

The above step will run continuously in the background until it hits the timeout specified in the script. 

The output will look similar to the following, with the number of processes terminated indicated after oom_kill:

 

Additional Notes:

This script will only detect if a process was terminated, and not which process it was.

 

Additional Resources:

Was this article helpful?
1 out of 1 found this helpful

Comments

0 comments

Article is closed for comments.