feat: Add custom memory pool support for tensor allocation by goog00 · Pull Request #96 · NVlabs/cutile-rs

goog00 · 2026-04-15T10:04:10Z

Summary

Pool primitives: MemPool RAII wrapper in cuda-core with create/default/threshold/destroy lifecycle
Device-level configuration: set_device_pool / get_device_pool / clear_device_pool on AsyncDeviceContext; pool is frozen into ExecutionContext at scheduling time so in-flight ops are unaffected by later changes
Unified allocation path: ExecutionContext::alloc_async() routes through cuMemAllocFromPoolAsync when a pool is set, falls back to cuMemAllocAsync otherwise. All tensor/copy callsites updated
Safety: set_device_pool validates device affinity; .schedule() now reads pool (was always None), aligned with .sync() / .await
Tests: 12 integration tests covering pool lifecycle, allocation routing, freeze semantics, and the explicit .schedule() path;

No behavior change when no pool is configured.

Test plan

cargo test -p cuda-async --test pool_allocation passes
./scripts/run_all.sh passes

copy-pr-bot · 2026-04-15T10:04:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add MemPool wrapping CUmemoryPool with device ownership and RAII drop - set_device_pool() registers pool per-device; rejects cross-device pools - ExecutionContext::new() auto-resolves pool from stream's device via pool_for_stream() - All scheduling paths (.sync, .await, .schedule, .sync_on, .async_on) carry pool - Add cuda-async integration tests covering lifecycle, scheduling freeze, and cross-device rejection

goog00 · 2026-04-24T07:27:22Z

I've refactored the implementation since the last push — cleaner and less intrusive.
Happy to discuss any concerns.

@elibol PTAL when you're available, thank you!

elibol · 2026-04-24T16:47:21Z

No immediate concerns! We already reviewed the approach extensively. I will begin looking at these once we have v0.0.2 cut. I'd like to have your PRs merged in after that, so we can iterate and address bugs while folks try out v0.0.2. Hoping to get it out soon. We might just cut v0.1.0 immediately after v0.0.2. Just trying to align versions and features in a manageable way.

I'll prioritize this and your other PR next week 🙂

goog00 · 2026-04-25T00:39:49Z

Sounds good to me, thank you for the update.

elibol · 2026-04-29T18:51:20Z

LGTM! Merging!

ur4t · 2026-04-30T02:10:10Z

+        props.allocType = cuda_bindings::CUmemAllocationType_enum_CU_MEM_ALLOCATION_TYPE_PINNED;
+        props.handleTypes = cuda_bindings::CUmemAllocationHandleType_enum_CU_MEM_HANDLE_TYPE_NONE;
+        props.location.type_ = cuda_bindings::CUmemLocationType_enum_CU_MEM_LOCATION_TYPE_DEVICE;
+        props.location.__bindgen_anon_1.id = self.ordinal as c_int;


@goog00 @elibol id in CUmemLocation_st is actually an int field which should be accessed directly, but bindgen sometimes produces a c_int as expected and sometimes produces an union wrapper struct. Investigating.

@goog00 @elibol id in CUmemLocation_st is an union wrapper in CUDA 13.2. I have multiple CUDA environments thus generated bindings vary accrodingly.

I have multiple CUDA environments thus generated bindings vary accrodingly.

@ur4t Thanks for the investigation! Could you share which CUDA version you're using? including pre-13.x?

Could you share which CUDA version you're using? including pre-13.x?

@goog00 CUDA 13.0 installed along with PyTorch. And I have checked cuda.h in CUDA 13.1, it uses int. Only CUDA 13.2 uses a union wrapper.

@ur4t Thank you for checking! I'll open a follow-up PR to handle the version compatibility across different CUDA releases.

@ur4t @elibol Fixed in #128

goog00 mentioned this pull request Apr 15, 2026

Custom tensor allocator support #12

Closed

goog00 force-pushed the allocator branch 3 times, most recently from 2f37344 to b909509 Compare April 24, 2026 06:42

goog00 force-pushed the allocator branch from b909509 to b0979ba Compare April 24, 2026 06:57

jollylili added the P0 label Apr 24, 2026

Merge branch 'main' into allocator

f8556d9

elibol merged commit c831c24 into NVlabs:main Apr 29, 2026
4 checks passed

ur4t reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add custom memory pool support for tensor allocation#96

feat: Add custom memory pool support for tensor allocation#96
elibol merged 2 commits intoNVlabs:mainfrom
goog00:allocator

goog00 commented Apr 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

goog00 commented Apr 24, 2026 •

edited

Loading

Uh oh!

elibol commented Apr 24, 2026

Uh oh!

goog00 commented Apr 25, 2026

Uh oh!

elibol commented Apr 29, 2026

Uh oh!

Uh oh!

ur4t Apr 30, 2026

Uh oh!

ur4t Apr 30, 2026

Uh oh!

goog00 Apr 30, 2026

Uh oh!

ur4t Apr 30, 2026

Uh oh!

goog00 Apr 30, 2026

Uh oh!

goog00 Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

goog00 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

goog00 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elibol commented Apr 24, 2026

Uh oh!

goog00 commented Apr 25, 2026

Uh oh!

elibol commented Apr 29, 2026

Uh oh!

Uh oh!

ur4t Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ur4t Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

goog00 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ur4t Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

goog00 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

goog00 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

goog00 commented Apr 15, 2026 •

edited

Loading

goog00 commented Apr 24, 2026 •

edited

Loading