Overview
If you are trying to use Docker and the CircleCI GPU executor, you may get the following error.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0407] error waiting for container: context canceled
This is due to the removal of the nvidia-container-toolkit
when CircleCI Switched to using images with multiple CUDA versions available at runtime.
Solution
Step 1: Add a step in your config.yml to install `nvidia-container-toolkit` and Restart Docker
- run:
name: Install nvidia-container-toolkit and Restart Docker
command: |
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Step 2: Verify Compatibility
To ensure that Docker and GPUs are working together, you can run a test container using the nvidia/cuda:11.4.3-base-ubuntu20.04
image. This container will execute the nvidia-smi
command to display GPU information.
- run:
name: Test GPU Docker
command: docker run --gpus all nvidia/cuda:11.4.3-base-ubuntu20.04 nvidia-smi
If the nvidia-container-toolkit
is functioning correctly and Docker can utilize the GPU resources, you should see the GPU information displayed.
Additional Resources
- Additional information about the nvidia-container-toolkit.
- Linux CUDA Images Support Policy
- Using a GPU Resource Class on CircleCI
Comments
Article is closed for comments.