Document Warp extension points by shi-eric · Pull Request #1494 · NVIDIA/warp

shi-eric · 2026-05-30T19:27:33Z

Description

Add a new Extending Warp user guide page that makes public extension points easier to discover. The page now covers native C++/CUDA snippets via wp.func_native, custom allocators, custom loggers, and related extension points.

The existing differentiability page now keeps only autodiff-specific native snippet guidance and links to the new page for general syntax. Related pages in Basics, Codegen, Allocators, and Debugging now cross-link to the new hub. The generated API docs for wp.func_native also get a fuller docstring with safer examples.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Test plan

uvx pre-commit run --files docs/index.rst docs/user_guide/extending_warp.rst docs/user_guide/basics.rst docs/user_guide/differentiability.rst docs/deep_dive/allocators.rst docs/user_guide/debugging.rst docs/deep_dive/codegen.rst warp/_src/context.py
WARP_CACHE_ROOT=/tmp/warp-cache-warp-worktree-2-docs WARP_CACHE_PATH=/tmp/warp-cache-warp-worktree-2-docs uv run --extra docs build_docs.py 2>&1 | tee /tmp/build_docs.log
rg -n "WARNING|undefined label|unknown document|Title underline too short|Inline emphasis" /tmp/build_docs.log

New feature / enhancement

This is a documentation discoverability enhancement for existing Warp extension APIs.

Summary by CodeRabbit

Documentation
- Added "Extending Warp" guide covering native C++/CUDA functions, examples (shared-memory, inline PTX), differentiable natives with adj/replay snippets, and constraints.
- Documented custom CUDA allocators (affect CUDA device arrays only) and noted CUDA-graph capture caveats.
- Documented custom loggers and installation/restoration.
- Streamlined native-function, debugging, codegen, basics, and index pages with cross-references and embedding guidance.

Native snippets had outgrown their historical home in the differentiability docs. That made the general C++/CUDA escape hatch difficult for power users to discover outside autodiff use cases. Add an Extending Warp user guide page for native snippets, custom allocators, and custom loggers. Keep differentiability focused on adjoint and replay behavior, add cross-links from related pages, and expand the generated API docs for wp.func_native with safer examples. Signed-off-by: Eric Shi <ershi@nvidia.com>

coderabbitai · 2026-05-30T19:27:44Z

📝 Walkthrough

Walkthrough

Adds a new "Extending Warp" user guide describing native functions, custom CUDA allocators, and custom loggers; expands the @wp.func_native docstring; and updates related docs and examples across the user guide and deep-dive documentation.

Changes

Extension Points Guide and Cross-References

Layer / File(s)	Summary
New extending_warp user guide and func_native docstring `docs/user_guide/extending_warp.rst`, `warp/_src/context.py`	Adds the "Extending Warp" page documenting `@wp.func_native` native snippets (`snippet`, `adj_snippet`, `replay_snippet`), `warp.Allocator`/allocator installation APIs, and `wp.Logger` requirements; expands `func_native` docstring and updates a kernel example annotation.
Navigation and introductory cross-references `docs/index.rst`, `docs/user_guide/basics.rst`	Adds `user_guide/extending_warp` to the User Guide toctree and replaces an inline basics description with cross-references to `differentiability` and `extending_warp`.
Deep-dive documentation updates for allocators and code generation `docs/deep_dive/allocators.rst`, `docs/deep_dive/codegen.rst`	Clarifies that custom allocators affect CUDA `warp.array` allocations only, adjusts mempool-copy capture wording, renames an RMM subsection, and adds a note in codegen about embedding small native C++/CUDA snippets via `@wp.func_native`.
Differentiability and debugging guide refactoring `docs/user_guide/differentiability.rst`, `docs/user_guide/debugging.rst`	Replaces the native-function example in differentiability with a smaller axpy/tape-based example showing `adj_snippet` and `replay_snippet`; condenses debugging's custom-logger guidance to reference `wp.Logger`/`wp.set_logger`/`wp.ScopedLogger` and updates the tid overflow reference.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Document Warp extension points' directly and clearly summarizes the main objective of the changeset, which is to add comprehensive documentation for Warp's public extension points including native functions, custom allocators, and custom loggers.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

docs/user_guide/differentiability.rst (1)
675-676: ⚡ Quick win

Use explicit keyword arguments for wp.func_native snippets to avoid API-order ambiguity.

Given the text says “provide adj_snippet,” the decorator call is safer/clearer as keyword args (snippet=..., adj_snippet=...) instead of positional arguments.
Suggested doc tweak
-    `@wp.func_native`(snippet, adj_snippet)
+    `@wp.func_native`(snippet=snippet, adj_snippet=adj_snippet)
     def axpy(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user_guide/differentiability.rst` around lines 675 - 676, The decorator
call for axpy uses positional args which can be ambiguous; change the
`@wp.func_native` usage to pass explicit keyword arguments (e.g., snippet=... and
adj_snippet=...) so the decorator invocation is unambiguous—update the
`@wp.func_native`(...) line above the axpy definition to use snippet= and
adj_snippet= named parameters.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/user_guide/differentiability.rst`:
- Around line 675-676: The decorator call for axpy uses positional args which
can be ambiguous; change the `@wp.func_native` usage to pass explicit keyword
arguments (e.g., snippet=... and adj_snippet=...) so the decorator invocation is
unambiguous—update the `@wp.func_native`(...) line above the axpy definition to
use snippet= and adj_snippet= named parameters.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 24bc03c1-c570-4231-a7c4-733be99601cd

📥 Commits

Reviewing files that changed from the base of the PR and between 1d72324 and dfad38b.

📒 Files selected for processing (8)

docs/deep_dive/allocators.rst
docs/deep_dive/codegen.rst
docs/index.rst
docs/user_guide/basics.rst
docs/user_guide/debugging.rst
docs/user_guide/differentiability.rst
docs/user_guide/extending_warp.rst
warp/_src/context.py

greptile-apps · 2026-05-30T19:34:34Z

Greptile Summary

This PR adds a new "Extending Warp" guide that consolidates previously scattered documentation on @wp.func_native, custom allocators, and custom loggers into a single discoverable page. Existing pages in basics, debugging, differentiability, allocators, and codegen are updated to cross-reference the new hub instead of duplicating content.

extending_warp.rst is a new user guide covering native C++/CUDA snippets (including shared memory, inline PTX, differentiable snippets, and replay), custom CUDA allocators, and custom loggers, with self-contained runnable examples.
warp/_src/context.py receives a substantially improved func_native docstring with structured Args, Returns, Note, and Example sections.
All other changed files are editorial: moved content, updated cross-references, and minor prose clarifications.

Confidence Score: 5/5

Documentation-only change with no runtime code modifications beyond an expanded docstring; safe to merge.

All changes are documentation restructuring and a docstring expansion. No logic, APIs, or runtime behaviour is modified. All cross-references point to files that exist in the repo, and the inline code examples in the new page are self-contained and mathematically verified.

No files require special attention.

Important Files Changed

Filename	Overview
docs/user_guide/extending_warp.rst	New hub page consolidating native C++/CUDA, custom allocator, and custom logger docs; one style inconsistency in the sad_kernel example uses old-style wp.array(dtype=...) annotations
warp/_src/context.py	Expanded func_native docstring with structured Args/Returns/Note/Example/See Also sections; fixes wp.array(dtype=float) annotation to wp.array[float] in the kernel example
docs/user_guide/differentiability.rst	Streamlined native function section to autodiff-specific content (adj_snippet, replay_snippet); examples updated, simplified from 128- to 8-element arrays
docs/user_guide/debugging.rst	Custom Logger section reduced to a brief intro and cross-reference to extending_warp; fixes wp.tid() cross-reference from private _src.lang.tid to public warp.tid
docs/deep_dive/allocators.rst	Custom Allocators intro refactored to point at extending_warp; limitation prose clarified; RMM section heading expanded
docs/index.rst	Adds extending_warp to the toctree in the correct position between basics and runtime
docs/user_guide/basics.rst	Cross-reference updated to split native function guidance to extending_warp and keep autodiff guidance in differentiability
docs/deep_dive/codegen.rst	Adds four-line paragraph linking func_native to extending_warp inside the code-generation internals section

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[basics.rst] -->|cross-ref: native C++/CUDA| E[extending_warp.rst]
    B[differentiability.rst] -->|cross-ref: general func_native syntax| E
    C[debugging.rst] -->|cross-ref: wp.Logger / set_logger| E
    D[codegen.rst] -->|cross-ref: func_native API| E
    F[allocators.rst] -->|cross-ref: Allocator protocol| E
    E -->|cross-ref: autodiff details| B
    E -->|cross-ref: log levels / debug tools| C
    E -->|cross-ref: PyTorch & RMM examples| F
    H[context.py] -->|See Also| E
    I[index.rst] -->|toctree entry| E

_{Reviews (4): Last reviewed commit: "Clarify extension and allocator docs" | Re-trigger Greptile}

Review bots noted that two native snippet examples were less clear than they should be. The differentiability example used positional decorator arguments even though the surrounding prose names adj_snippet, and the shared-memory example launched with arr and out without defining them in the same block. Use keyword arguments for adj_snippet examples and make the shared-memory launch snippet define its inputs before launch. This keeps the docs easier to copy and less dependent on decorator argument order. Signed-off-by: Eric Shi <ershi@nvidia.com>

Several extension docs used imprecise terminology around diagnostics, custom allocator scope, and native snippets. The allocator docs also introduced RMM without expanding the acronym, and the native function section only showed CUDA launches despite pure C++ snippets working on CPU kernels. Update the docs to distinguish environment diagnostics from internal logging, spell out RAPIDS Memory Manager, describe allocator routing as current support, and add a CUDA inline PTX example using vabsdiff4 for packed-byte SAD. This keeps the examples practical while documenting the boundary between CPU-compatible C++ snippets and CUDA-only native code. Signed-off-by: Eric Shi <ershi@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/deep_dive/allocators.rst`:
- Around line 406-407: The section underline for the heading "RAPIDS Memory
Manager (RMM) Integration" is one character short; update the underline (the row
of ~ characters immediately below that heading in docs/deep_dive/allocators.rst)
so it is at least as long as the heading text (add one more ~ to make it 40
characters) so the reStructuredText title underline matches or exceeds the
heading length.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 6579d5fe-1527-4e54-b1bd-da2362eeb29c

📥 Commits

Reviewing files that changed from the base of the PR and between bea13a6 and 398c868.

📒 Files selected for processing (4)

docs/deep_dive/allocators.rst
docs/user_guide/debugging.rst
docs/user_guide/extending_warp.rst
warp/_src/context.py

✅ Files skipped from review due to trivial changes (3)

warp/_src/context.py
docs/user_guide/debugging.rst
docs/user_guide/extending_warp.rst

coderabbitai · 2026-05-30T21:31:16Z

+RAPIDS Memory Manager (RMM) Integration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the section underline length.

The underline is 39 characters but the heading "RAPIDS Memory Manager (RMM) Integration" is 40 characters. In reStructuredText, the underline must be at least as long as the heading text.

📝 Proposed fix

RAPIDS Memory Manager (RMM) Integration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RAPIDS Memory Manager (RMM) Integration

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RAPIDS Memory Manager (RMM) Integration

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/deep_dive/allocators.rst` around lines 406 - 407, The section underline for the heading "RAPIDS Memory Manager (RMM) Integration" is one character short; update the underline (the row of ~ characters immediately below that heading in docs/deep_dive/allocators.rst) so it is at least as long as the heading text (add one more ~ to make it 40 characters) so the reStructuredText title underline matches or exceeds the heading length.

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

greptile-apps Bot reviewed May 30, 2026

View reviewed changes

Comment thread docs/user_guide/extending_warp.rst

shi-eric added 2 commits May 30, 2026 20:05

shi-eric force-pushed the ershi/docs-update branch from bea13a6 to 398c868 Compare May 30, 2026 21:26

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Warp extension points#1494

Document Warp extension points#1494
shi-eric wants to merge 3 commits into
NVIDIA:mainfrom
shi-eric:ershi/docs-update

shi-eric commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

greptile-apps Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		RAPIDS Memory Manager (RMM) Integration
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conversation

shi-eric commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Test plan

New feature / enhancement

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shi-eric commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading

greptile-apps Bot commented May 30, 2026 •

edited

Loading