-
Notifications
You must be signed in to change notification settings - Fork 275
Add CDI device selector and Nvidia GPU passthrouh for Linux #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from 4 commits
96fa386
12ebda8
a8f671b
86ab9c6
b16dc42
2c2aca7
719c90f
793631a
24cc238
af8f707
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,7 +150,18 @@ async def __start_container(self) -> None: | |
| self.__container.start() | ||
| except APIError as e: | ||
| logger.debug(e) | ||
| logger.critical(f"Docker raised a critical error when starting the container [green]{self.name}[/green], error message is: {e.explanation}") | ||
| explanation = e.explanation | ||
| if explanation is None: | ||
| explanation = "" | ||
|
Macbucheron1 marked this conversation as resolved.
Outdated
|
||
| elif isinstance(explanation, bytes): | ||
| explanation = explanation.decode("utf-8", errors="ignore") | ||
| message = str(explanation) | ||
|
Macbucheron1 marked this conversation as resolved.
Outdated
|
||
| lower_message = message.lower() | ||
| message = message.replace('[', '\\[') | ||
| logger.error(f"Docker raised a critical error when starting the container [green]{self.name}[/green], error message is: {message}") | ||
| if "cdi device injection failed" in lower_message and "nvidia.com/gpu=all" in lower_message: | ||
| logger.warning("Hint: verify NVIDIA CDI is configured (e.g. nvidia-container-toolkit installed and Docker CDI enabled).") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we check with PS: can we link the user to the nvidia doc on how-to install the nvidia toolkit for users who don't know this ?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can only check whether Docker currently sees NVIDIA CDI devices or not. That does not strictly tell us whether the NVIDIA toolkit is enabled, since the CDI spec may simply not be generated or discovered yet. Docker exposes CDI support and discovered devices in If we want to handle NVIDIA separately, we could also check for the presence of $ docker info
Client:
Version: 29.2.1
Context: default
...
CDI spec directories:
/etc/cdi
/var/run/cdi
Discovered Devices:
cdi: nvidia.com/gpu=0
cdi: nvidia.com/gpu=all
...And using the SDK: $ python3
Python 3.13.12 (main, Feb 3 2026, 17:53:27) [GCC 15.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import docker
>>> info = docker.from_env().info()
>>> print(info.get("DiscoveredDevices",[]))
[{'Source': 'cdi', 'ID': 'nvidia.com/gpu=0'}, {'Source': 'cdi', 'ID': 'nvidia.com/gpu=all'}]added the link to nvidia doc in 24cc238. Also removed about Docker CDI enabled since it is enable by default since v27 |
||
| logger.critical("Error while starting exegol container. Exiting.") | ||
| if not self.config.legacy_entrypoint: # TODO improve startup compatibility check | ||
| try: | ||
| # Try to find log / startup messages. Will time out after 2 seconds if the image don't support status update through container logs. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.