How to detect when a process is killed by the OOM killer

Introduction:

When processes running in a job consume an amount memory near the limit provided by the resource class being used, they can be killed by the host machine.

It is possible to detect when a process was terminated by monitoring several files within the /sys/fs/cgroup/memory/ directory.

Prerequisites:

Please ensure that you are using the Docker executor, and that you already have a working config.yml file committed to your repository.

As of 2025-04-03, you will need to use the cgroups v2 variant of the below background steps as the cgroup control files have changed with the recent Docker executor upgrades.

Instructions:

  1. Step 1 - Create a new run step at the beginning of your job
  2. Step 2 - Set background:true to run the step continuously
  3. Step 3 - Add the script under command in the run step

    • cgroups v2

      - run:
          name: monitor resources
          command: |
            # Loop 100 times to monitor memory usage 
            for i in {1..100}; do
               # Display the current memory usage in bytes 
               echo -n "Memory Usage: "
               cat /sys/fs/cgroup/memory.current
      
               # Display the memory hard limit in bytes 
               echo -n "Memory Limit (max): "
               cat /sys/fs/cgroup/memory.max
      
               # Display the memory throttling threshold (new in v2)
               echo -n "Memory High Threshold: "
               cat /sys/fs/cgroup/memory.high
      
               # Display memory events (replaces oom_control)
               echo "Memory Events:"
               cat /sys/fs/cgroup/memory.events
      
               # Display memory pressure information (new in v2)
               echo "Memory Pressure (PSI):"
               cat /sys/fs/cgroup/memory.pressure
      
               # Display the current date and time
               echo -n "Current Date: "
               date  
      
               # Sleep for 1 second before the next iteration
               sleep 1
      
               # Add a separator for better readability
               echo "-------------------------"
            done  
          background: true

      cgroups v1 (deprecated)

      - run:
          command: |
            # Loop 100 times to monitor memory usage
            for i in {1..100}; do
              # Display the current memory usage in bytes
              echo -n "Memory Usage: "
              cat /sys/fs/cgroup/memory/memory.usage_in_bytes
              
              # Display the memory limit in bytes
              echo -n "Memory Limit: "
              cat /sys/fs/cgroup/memory/memory.limit_in_bytes
              
              # Display the Out-Of-Memory control settings
              echo -n "OOM Control: "
              #cat /sys/fs/cgroup/memory/memory.oom_control
              cat /sys/fs/cgroup/memory/memory.oom_control | sed -n 3p
              
              # Display the current date and time
              echo -n "Current Date: "
              date
              
              # Sleep for 1 second before the next iteration
              sleep 1
              
              # Add a separator for better readability
              echo "-------------------------"
            done
          background: true
  4. Step 4 - Adjust lines 2 and 21 to set how long you want the script to run
    • By default, the script will loop 100 times, each iteration for one second.

Outcome:

The above step will run continuously in the background until it hits the timeout specified in the script. 

The output will look similar to the following, with the number of processes terminated indicated after oom_kill:

 

Additional Notes:

This script will only detect if a process was terminated, and not which process it was.

 

Additional Resources:

Was this article helpful?
3 out of 4 found this helpful

Comments

0 comments

Article is closed for comments.