Skip to content

Fix managed memory misclassified as kDLCUDAHost in DLPack device mapping#1863

Merged
cpcloud merged 5 commits intoNVIDIA:mainfrom
rparolin:fix/managed-memory-dlpack-device-type
Apr 7, 2026
Merged

Fix managed memory misclassified as kDLCUDAHost in DLPack device mapping#1863
cpcloud merged 5 commits intoNVIDIA:mainfrom
rparolin:fix/managed-memory-dlpack-device-type

Conversation

@rparolin
Copy link
Copy Markdown
Collaborator

@rparolin rparolin commented Apr 6, 2026

Summary

  • _smv_get_dl_device() classified all device+host-accessible buffers as kDLCUDAHost, including managed (unified) memory which should be kDLCUDAManaged
  • CCCL's make_tma_descriptor rejects kDLCUDAHost with "Device type must be kDLCUDA or kDLCUDAManaged", breaking TMA descriptor creation from managed buffers
  • Preserve the is_managed flag (already queried via CU_POINTER_ATTRIBUTE_IS_MANAGED in _query_memory_attrs()) in _MemAttrs, expose it on Buffer, and use it in _smv_get_dl_device() to return kDLCUDAManaged

Fixes: https://nvbugspro.nvidia.com/bug/6044342

🤖 Generated with Claude Code

…vice mapping

_smv_get_dl_device() treated all buffers that are both device- and
host-accessible as kDLCUDAHost. Managed (unified) memory is also both-
accessible, so it was misclassified. CCCL's make_tma_descriptor then
rejected the descriptor with "Device type must be kDLCUDA or
kDLCUDAManaged".

Preserve the is_managed flag already queried via
CU_POINTER_ATTRIBUTE_IS_MANAGED in _query_memory_attrs(), expose it on
Buffer, and use it in _smv_get_dl_device() to return kDLCUDAManaged for
managed memory.

Fixes: https://nvbugspro.nvidia.com/bug/6044342

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rparolin rparolin added this to the cuda.core v0.7.0 milestone Apr 6, 2026
@rparolin rparolin added bug Something isn't working cuda.core Everything related to the cuda.core module labels Apr 6, 2026
@rparolin rparolin requested review from cpcloud and leofang April 6, 2026 15:49
@rparolin rparolin self-assigned this Apr 6, 2026
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I don't think the current patch closes the managed-buffer tensor-map path yet.

StridedMemoryView.from_any_interface(buffer) on a Buffer still goes through Buffer.__dlpack_device__() and _dlpack.setup_dl_tensor_device(), and both of those still map managed memory to kDLCUDAHost. Because _smv_get_dl_device() prefers view.dl_tensor.device when a DLPack capsule is present, the new branch here never corrects that case.

Please update the buffer-side DLPack export to emit kDLCUDAManaged as well, and keep the classification logic in one place so the Buffer and StridedMemoryView paths stay aligned. The existing managed-buffer tensor-map test already exercises this via _as_view(managed_buf), so that should become the regression guard once the buffer-side export is fixed too.

rparolin and others added 2 commits April 6, 2026 15:54
Update setup_dl_tensor_device() and Buffer.__dlpack_device__() to emit
kDLCUDAManaged for managed memory, closing the gap where the Buffer ->
DLPack capsule -> StridedMemoryView path still misclassified managed
buffers as kDLCUDAHost. Add cross-reference comments to keep the three
classification sites aligned.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract the duplicated device-type mapping logic from
Buffer.__dlpack_device__(), setup_dl_tensor_device(), and
_smv_get_dl_device() into a single classify_dl_device() function
in _dlpack.pyx. All three call sites now delegate to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rparolin rparolin requested a review from cpcloud April 6, 2026 23:05
rparolin and others added 2 commits April 6, 2026 16:06
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix test_buffer_dunder_dlpack_device_success to expect kDLCUDAManaged
  for unified memory instead of the old buggy kDLCUDAHost.
- Fix test_buffer_dlpack_failure_clean_up error message to match the
  unified classify_dl_device error.
- Add test_managed_buffer_dlpack_roundtrip_device_type to cover the
  Buffer -> DLPack capsule -> StridedMemoryView end-to-end path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rparolin
Copy link
Copy Markdown
Collaborator Author

rparolin commented Apr 6, 2026

@cpcloud please re-review

Copy link
Copy Markdown
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer-side DLPack export paths are fixed now, and centralizing the device classification removes the drift risk I was worried about. The new managed-buffer roundtrip test also covers the end-to-end Buffer -> DLPack -> StridedMemoryView path, so this looks good to me.

@cpcloud cpcloud merged commit 4d8ee87 into NVIDIA:main Apr 7, 2026
87 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants