Skip to content

Document Warp extension points#1494

Draft
shi-eric wants to merge 3 commits into
NVIDIA:mainfrom
shi-eric:ershi/docs-update
Draft

Document Warp extension points#1494
shi-eric wants to merge 3 commits into
NVIDIA:mainfrom
shi-eric:ershi/docs-update

Conversation

@shi-eric
Copy link
Copy Markdown
Contributor

@shi-eric shi-eric commented May 30, 2026

Description

Add a new Extending Warp user guide page that makes public extension points easier to discover. The page now covers native C++/CUDA snippets via wp.func_native, custom allocators, custom loggers, and related extension points.

The existing differentiability page now keeps only autodiff-specific native snippet guidance and links to the new page for general syntax. Related pages in Basics, Codegen, Allocators, and Debugging now cross-link to the new hub. The generated API docs for wp.func_native also get a fuller docstring with safer examples.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Test plan

  • uvx pre-commit run --files docs/index.rst docs/user_guide/extending_warp.rst docs/user_guide/basics.rst docs/user_guide/differentiability.rst docs/deep_dive/allocators.rst docs/user_guide/debugging.rst docs/deep_dive/codegen.rst warp/_src/context.py
  • WARP_CACHE_ROOT=/tmp/warp-cache-warp-worktree-2-docs WARP_CACHE_PATH=/tmp/warp-cache-warp-worktree-2-docs uv run --extra docs build_docs.py 2>&1 | tee /tmp/build_docs.log
  • rg -n "WARNING|undefined label|unknown document|Title underline too short|Inline emphasis" /tmp/build_docs.log

New feature / enhancement

This is a documentation discoverability enhancement for existing Warp extension APIs.

Summary by CodeRabbit

  • Documentation
    • Added "Extending Warp" guide covering native C++/CUDA functions, examples (shared-memory, inline PTX), differentiable natives with adj/replay snippets, and constraints.
    • Documented custom CUDA allocators (affect CUDA device arrays only) and noted CUDA-graph capture caveats.
    • Documented custom loggers and installation/restoration.
    • Streamlined native-function, debugging, codegen, basics, and index pages with cross-references and embedding guidance.

Review Change Stack

Native snippets had outgrown their historical home in the
differentiability docs. That made the general C++/CUDA escape
hatch difficult for power users to discover outside autodiff use
cases.

Add an Extending Warp user guide page for native snippets, custom
allocators, and custom loggers. Keep differentiability focused on
adjoint and replay behavior, add cross-links from related pages, and
expand the generated API docs for wp.func_native with safer examples.

Signed-off-by: Eric Shi <ershi@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

📝 Walkthrough

Walkthrough

Adds a new "Extending Warp" user guide describing native functions, custom CUDA allocators, and custom loggers; expands the @wp.func_native docstring; and updates related docs and examples across the user guide and deep-dive documentation.

Changes

Extension Points Guide and Cross-References

Layer / File(s) Summary
New extending_warp user guide and func_native docstring
docs/user_guide/extending_warp.rst, warp/_src/context.py
Adds the "Extending Warp" page documenting @wp.func_native native snippets (snippet, adj_snippet, replay_snippet), warp.Allocator/allocator installation APIs, and wp.Logger requirements; expands func_native docstring and updates a kernel example annotation.
Navigation and introductory cross-references
docs/index.rst, docs/user_guide/basics.rst
Adds user_guide/extending_warp to the User Guide toctree and replaces an inline basics description with cross-references to differentiability and extending_warp.
Deep-dive documentation updates for allocators and code generation
docs/deep_dive/allocators.rst, docs/deep_dive/codegen.rst
Clarifies that custom allocators affect CUDA warp.array allocations only, adjusts mempool-copy capture wording, renames an RMM subsection, and adds a note in codegen about embedding small native C++/CUDA snippets via @wp.func_native.
Differentiability and debugging guide refactoring
docs/user_guide/differentiability.rst, docs/user_guide/debugging.rst
Replaces the native-function example in differentiability with a smaller axpy/tape-based example showing adj_snippet and replay_snippet; condenses debugging's custom-logger guidance to reference wp.Logger/wp.set_logger/wp.ScopedLogger and updates the tid overflow reference.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Document Warp extension points' directly and clearly summarizes the main objective of the changeset, which is to add comprehensive documentation for Warp's public extension points including native functions, custom allocators, and custom loggers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/user_guide/differentiability.rst (1)

675-676: ⚡ Quick win

Use explicit keyword arguments for wp.func_native snippets to avoid API-order ambiguity.

Given the text says “provide adj_snippet,” the decorator call is safer/clearer as keyword args (snippet=..., adj_snippet=...) instead of positional arguments.

Suggested doc tweak
-    `@wp.func_native`(snippet, adj_snippet)
+    `@wp.func_native`(snippet=snippet, adj_snippet=adj_snippet)
     def axpy(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user_guide/differentiability.rst` around lines 675 - 676, The decorator
call for axpy uses positional args which can be ambiguous; change the
`@wp.func_native` usage to pass explicit keyword arguments (e.g., snippet=... and
adj_snippet=...) so the decorator invocation is unambiguous—update the
`@wp.func_native`(...) line above the axpy definition to use snippet= and
adj_snippet= named parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/user_guide/differentiability.rst`:
- Around line 675-676: The decorator call for axpy uses positional args which
can be ambiguous; change the `@wp.func_native` usage to pass explicit keyword
arguments (e.g., snippet=... and adj_snippet=...) so the decorator invocation is
unambiguous—update the `@wp.func_native`(...) line above the axpy definition to
use snippet= and adj_snippet= named parameters.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 24bc03c1-c570-4231-a7c4-733be99601cd

📥 Commits

Reviewing files that changed from the base of the PR and between 1d72324 and dfad38b.

📒 Files selected for processing (8)
  • docs/deep_dive/allocators.rst
  • docs/deep_dive/codegen.rst
  • docs/index.rst
  • docs/user_guide/basics.rst
  • docs/user_guide/debugging.rst
  • docs/user_guide/differentiability.rst
  • docs/user_guide/extending_warp.rst
  • warp/_src/context.py

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR adds a new "Extending Warp" guide that consolidates previously scattered documentation on @wp.func_native, custom allocators, and custom loggers into a single discoverable page. Existing pages in basics, debugging, differentiability, allocators, and codegen are updated to cross-reference the new hub instead of duplicating content.

  • extending_warp.rst is a new user guide covering native C++/CUDA snippets (including shared memory, inline PTX, differentiable snippets, and replay), custom CUDA allocators, and custom loggers, with self-contained runnable examples.
  • warp/_src/context.py receives a substantially improved func_native docstring with structured Args, Returns, Note, and Example sections.
  • All other changed files are editorial: moved content, updated cross-references, and minor prose clarifications.

Confidence Score: 5/5

Documentation-only change with no runtime code modifications beyond an expanded docstring; safe to merge.

All changes are documentation restructuring and a docstring expansion. No logic, APIs, or runtime behaviour is modified. All cross-references point to files that exist in the repo, and the inline code examples in the new page are self-contained and mathematically verified.

No files require special attention.

Important Files Changed

Filename Overview
docs/user_guide/extending_warp.rst New hub page consolidating native C++/CUDA, custom allocator, and custom logger docs; one style inconsistency in the sad_kernel example uses old-style wp.array(dtype=...) annotations
warp/_src/context.py Expanded func_native docstring with structured Args/Returns/Note/Example/See Also sections; fixes wp.array(dtype=float) annotation to wp.array[float] in the kernel example
docs/user_guide/differentiability.rst Streamlined native function section to autodiff-specific content (adj_snippet, replay_snippet); examples updated, simplified from 128- to 8-element arrays
docs/user_guide/debugging.rst Custom Logger section reduced to a brief intro and cross-reference to extending_warp; fixes wp.tid() cross-reference from private _src.lang.tid to public warp.tid
docs/deep_dive/allocators.rst Custom Allocators intro refactored to point at extending_warp; limitation prose clarified; RMM section heading expanded
docs/index.rst Adds extending_warp to the toctree in the correct position between basics and runtime
docs/user_guide/basics.rst Cross-reference updated to split native function guidance to extending_warp and keep autodiff guidance in differentiability
docs/deep_dive/codegen.rst Adds four-line paragraph linking func_native to extending_warp inside the code-generation internals section

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[basics.rst] -->|cross-ref: native C++/CUDA| E[extending_warp.rst]
    B[differentiability.rst] -->|cross-ref: general func_native syntax| E
    C[debugging.rst] -->|cross-ref: wp.Logger / set_logger| E
    D[codegen.rst] -->|cross-ref: func_native API| E
    F[allocators.rst] -->|cross-ref: Allocator protocol| E
    E -->|cross-ref: autodiff details| B
    E -->|cross-ref: log levels / debug tools| C
    E -->|cross-ref: PyTorch & RMM examples| F
    H[context.py] -->|See Also| E
    I[index.rst] -->|toctree entry| E
Loading

Reviews (4): Last reviewed commit: "Clarify extension and allocator docs" | Re-trigger Greptile

Comment thread docs/user_guide/extending_warp.rst
shi-eric added 2 commits May 30, 2026 20:05
Review bots noted that two native snippet examples were less clear
than they should be. The differentiability example used positional
decorator arguments even though the surrounding prose names
adj_snippet, and the shared-memory example launched with arr and
out without defining them in the same block.

Use keyword arguments for adj_snippet examples and make the
shared-memory launch snippet define its inputs before launch.
This keeps the docs easier to copy and less dependent on decorator
argument order.

Signed-off-by: Eric Shi <ershi@nvidia.com>
Several extension docs used imprecise terminology around diagnostics,
custom allocator scope, and native snippets. The allocator docs also
introduced RMM without expanding the acronym, and the native function
section only showed CUDA launches despite pure C++ snippets working on
CPU kernels.

Update the docs to distinguish environment diagnostics from internal
logging, spell out RAPIDS Memory Manager, describe allocator routing as
current support, and add a CUDA inline PTX example using vabsdiff4 for
packed-byte SAD. This keeps the examples practical while documenting the
boundary between CPU-compatible C++ snippets and CUDA-only native code.

Signed-off-by: Eric Shi <ershi@nvidia.com>
@shi-eric shi-eric force-pushed the ershi/docs-update branch from bea13a6 to 398c868 Compare May 30, 2026 21:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/deep_dive/allocators.rst`:
- Around line 406-407: The section underline for the heading "RAPIDS Memory
Manager (RMM) Integration" is one character short; update the underline (the row
of ~ characters immediately below that heading in docs/deep_dive/allocators.rst)
so it is at least as long as the heading text (add one more ~ to make it 40
characters) so the reStructuredText title underline matches or exceeds the
heading length.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 6579d5fe-1527-4e54-b1bd-da2362eeb29c

📥 Commits

Reviewing files that changed from the base of the PR and between bea13a6 and 398c868.

📒 Files selected for processing (4)
  • docs/deep_dive/allocators.rst
  • docs/user_guide/debugging.rst
  • docs/user_guide/extending_warp.rst
  • warp/_src/context.py
✅ Files skipped from review due to trivial changes (3)
  • warp/_src/context.py
  • docs/user_guide/debugging.rst
  • docs/user_guide/extending_warp.rst

Comment on lines +406 to +407
RAPIDS Memory Manager (RMM) Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the section underline length.

The underline is 39 characters but the heading "RAPIDS Memory Manager (RMM) Integration" is 40 characters. In reStructuredText, the underline must be at least as long as the heading text.

📝 Proposed fix
 RAPIDS Memory Manager (RMM) Integration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RAPIDS Memory Manager (RMM) Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RAPIDS Memory Manager (RMM) Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/deep_dive/allocators.rst` around lines 406 - 407, The section underline
for the heading "RAPIDS Memory Manager (RMM) Integration" is one character
short; update the underline (the row of ~ characters immediately below that
heading in docs/deep_dive/allocators.rst) so it is at least as long as the
heading text (add one more ~ to make it 40 characters) so the reStructuredText
title underline matches or exceeds the heading length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant