Add SM and Tensor-core activity plot series (NVML GPM)#483
Open
rocker-zhang wants to merge 1 commit into
Open
Conversation
Adds opt-in chart series for SM activity and Tensor-core activity (plus SM occupancy and DRAM bandwidth %), addressing the request in issue Syllo#163. The data comes from the NVML GPM API (nvmlGpmSampleGet / nvmlGpmMetricsGet), which lives inside libnvidia-ml and is dlsym'd like the rest of the NVIDIA backend, so there is no new dependency (no DCGM daemon, no CUDA). Two samples are differenced each refresh; NVML returns 0..100 directly, so the new series reuse the existing percentage plot path (as power% / clock% already do). - Opt-in from the F2 chart menu; the backend samples GPM only while a series is enabled, so an idle nvtop never arms the shared perfmon counters. - GPM support is detected from the actual nvmlGpmSampleGet return; on GPUs that do not support it the series are simply absent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds opt-in chart series for SM activity and Tensor-core activity (plus SM occupancy and DRAM bandwidth %), addressing the request in #163.
The data comes from the NVML GPM API (
nvmlGpmSampleGet/nvmlGpmMetricsGet), which lives insidelibnvidia-mland isdlsym'd like the rest of the NVIDIA backend — so there is no new dependency (no DCGM daemon, no CUDA). Two samples are differenced each refresh; NVML returns 0–100 directly, so the new series reuse the existing percentage plot path (as the power% / clock% series already do).nvmlGpmSampleGetreturn code; on GPUs that don't support it the series are simply absent (no error, no clutter).dcgmi(DCGM_FI_PROF_SM_ACTIVE/PIPE_TENSOR_ACTIVE) to 3 decimals under a cuBLAS GEMM on a Blackwell board.Note: GPM reads the same hardware perfmon counters DCGM uses, so on a node already running a DCGM exporter the two can perturb each other's readings — which is why sampling is gated to "only while the series is displayed".