Skip to content

Document C++ and CUDA workflows#1494

Draft
shi-eric wants to merge 1 commit into
NVIDIA:mainfrom
shi-eric:ershi/docs-update
Draft

Document C++ and CUDA workflows#1494
shi-eric wants to merge 1 commit into
NVIDIA:mainfrom
shi-eric:ershi/docs-update

Conversation

@shi-eric
Copy link
Copy Markdown
Contributor

@shi-eric shi-eric commented May 30, 2026

Description

Add a new Extending Warp user guide page that makes public extension points easier to discover. The page now covers native C++/CUDA snippets via wp.func_native, custom allocators, custom loggers, and related extension points.

The existing differentiability page now keeps only autodiff-specific native snippet guidance and links to the new page for general syntax. Related pages in Basics, Codegen, Allocators, and Debugging now cross-link to the new hub. The generated API docs for wp.func_native also get a fuller docstring with safer examples.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Test plan

  • uvx pre-commit run --files docs/index.rst docs/user_guide/extending_warp.rst docs/user_guide/basics.rst docs/user_guide/differentiability.rst docs/deep_dive/allocators.rst docs/user_guide/debugging.rst docs/deep_dive/codegen.rst warp/_src/context.py
  • WARP_CACHE_ROOT=/tmp/warp-cache-warp-worktree-2-docs WARP_CACHE_PATH=/tmp/warp-cache-warp-worktree-2-docs uv run --extra docs build_docs.py 2>&1 | tee /tmp/build_docs.log
  • rg -n "WARNING|undefined label|unknown document|Title underline too short|Inline emphasis" /tmp/build_docs.log

New feature / enhancement

This is a documentation discoverability enhancement for existing Warp extension APIs.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive advanced customization guide covering native C++/CUDA code embedding, custom allocators, and custom loggers
    • Clarified custom allocator behavior and memory type handling in CUDA environments
    • Expanded native function decorator documentation with detailed parameter specifications
    • Updated differentiability documentation with native function examples and guidance

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a new comprehensive "Extending Warp" user guide documenting native function integration, custom allocators, and custom loggers, with coordinated updates to existing documentation, decorator docstrings, and navigation to establish consistent cross-references across the Warp documentation.

Changes

Customization APIs Documentation

Layer / File(s) Summary
New customization guide and func_native documentation
docs/user_guide/advanced_customization.rst, warp/_src/context.py
Introduces the comprehensive advanced_customization.rst guide covering native function integration (snippets, CUDA shared memory, PTX, return values, differentiability, limitations), custom CUDA allocators (protocol, routing, graph-capture notes), and custom loggers. Expands func_native decorator docstring with detailed parameter contracts, snippet semantics, constraints, and examples. Updates kernel documentation example annotation syntax from wp.array(dtype=float) to wp.array[float].
Navigation and cross-reference setup
docs/index.rst, docs/user_guide/basics.rst
Adds advanced_customization to the User Guide toctree. Replaces inline description of user-defined functions in basics with cross-references to differentiability and advanced_customization guides.
Related documentation updates
docs/deep_dive/allocators.rst, docs/deep_dive/codegen.rst, docs/user_guide/differentiability.rst, docs/user_guide/debugging.rst
Clarifies custom allocator scoping and limitations in allocators documentation (applies only to CUDA warp.array allocations, with explicit callouts for CPU, pinned memory, and internal temporaries). Adds codegen note about intentional native snippet injection via @wp.func_native. Refactors differentiability native-function example to use a smaller axpy pattern with explicit adjoint and replay snippets. Condenses debugging custom-logger guidance with cross-references. Updates tid overflow warning link in Debug Mode Compilation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Document C++ and CUDA workflows' is vague and does not accurately reflect the primary change. The PR is about documenting Warp's extension points (native snippets, allocators, loggers) with a new 'Extending Warp' guide, not broadly about 'C++ and CUDA workflows'. Consider a more specific title such as 'Document Warp extension points: native functions, allocators, and loggers' or 'Add Extending Warp guide for customization APIs'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/user_guide/differentiability.rst (1)

675-676: ⚡ Quick win

Use explicit keyword arguments for wp.func_native snippets to avoid API-order ambiguity.

Given the text says “provide adj_snippet,” the decorator call is safer/clearer as keyword args (snippet=..., adj_snippet=...) instead of positional arguments.

Suggested doc tweak
-    `@wp.func_native`(snippet, adj_snippet)
+    `@wp.func_native`(snippet=snippet, adj_snippet=adj_snippet)
     def axpy(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user_guide/differentiability.rst` around lines 675 - 676, The decorator
call for axpy uses positional args which can be ambiguous; change the
`@wp.func_native` usage to pass explicit keyword arguments (e.g., snippet=... and
adj_snippet=...) so the decorator invocation is unambiguous—update the
`@wp.func_native`(...) line above the axpy definition to use snippet= and
adj_snippet= named parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/user_guide/differentiability.rst`:
- Around line 675-676: The decorator call for axpy uses positional args which
can be ambiguous; change the `@wp.func_native` usage to pass explicit keyword
arguments (e.g., snippet=... and adj_snippet=...) so the decorator invocation is
unambiguous—update the `@wp.func_native`(...) line above the axpy definition to
use snippet= and adj_snippet= named parameters.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 24bc03c1-c570-4231-a7c4-733be99601cd

📥 Commits

Reviewing files that changed from the base of the PR and between 1d72324 and dfad38b.

📒 Files selected for processing (8)
  • docs/deep_dive/allocators.rst
  • docs/deep_dive/codegen.rst
  • docs/index.rst
  • docs/user_guide/basics.rst
  • docs/user_guide/debugging.rst
  • docs/user_guide/differentiability.rst
  • docs/user_guide/extending_warp.rst
  • warp/_src/context.py

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR adds a new C++ and CUDA Workflows user guide page (cpp_cuda_workflows.rst) that consolidates public extension entry points — native snippets via wp.func_native, AOT compilation, APIC replay, and native library headers. The differentiability.rst page is trimmed to autodiff-specific guidance and links out to the new hub, and cross-links are added in basics.rst, codegen.rst, debugging.rst, and allocators.rst. The func_native docstring in context.py is also significantly expanded.

  • New cpp_cuda_workflows.rst covers four extension workflows with runnable examples, inline PTX with assertion checks, and a limitations list; the page is wired into index.rst between the configuration and debugging entries.
  • differentiability.rst's Custom Native Functions section is refactored: the long CUDA shared-memory and atomic-counter demos are removed and replaced with a focused CPU-capable axpy example that retains the skipif: wp.get_cuda_device_count() == 0 guard inherited from the original section.
  • context.py's func_native docstring is rewritten with three code examples, a Note block, and a See Also cross-reference.

Confidence Score: 5/5

This is a documentation-only change with no runtime code modifications; it is safe to merge.

All changes are documentation and docstrings. The new page's testcode blocks are either guarded by skipif or verified with np.testing.assert_array_equal. All cross-references resolve to existing targets. The only finding is a skipif guard on a CPU-capable doctest that is more restrictive than necessary — it doesn't cause any incorrect behavior, just reduces test coverage on CPU-only CI.

No files require special attention.

Important Files Changed

Filename Overview
docs/user_guide/cpp_cuda_workflows.rst New hub page for C++/CUDA extension workflows; code examples include a verified PTX testcode block and prose-only code-block examples for shared memory and differentiable snippets.
warp/_src/context.py Expanded func_native docstring with Args, Returns, Note, and three code examples; also corrects a wp.array(dtype=float) → wp.array[float] annotation in the kernel decorator example.
docs/user_guide/differentiability.rst Custom Native Functions section slimmed to a focused axpy adjoint example; inherits the CUDA-only skipif guard even though the new snippet uses pure C++ and would run on CPU.
docs/index.rst Adds user_guide/cpp_cuda_workflows to the User Guide TOC between configuration and debugging.
docs/deep_dive/codegen.rst Adds a forward-reference blurb to wp.func_native and a .. _ahead_of_time_compilation_workflows: label for the cross-reference in the new page.
docs/deep_dive/allocators.rst Minor prose fixes: sentence-boundary split, semicolon → period, section heading expanded to spell out RAPIDS Memory Manager.
docs/user_guide/basics.rst Updates the cross-reference at the end of User Functions to split differentiability guidance from native C++/CUDA workflows.
docs/user_guide/debugging.rst Fixes two minor issues: semicolon → period in deprecation note, and corrects the wp.tid() cross-reference from the private warp._src.lang.tid path to the public warp.tid.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User needs non-Python integration] --> B{What kind?}
    B -->|Embed CUDA/C++ in kernel| C[cpp_cuda_workflows.rst\nNative Snippets section]
    B -->|Compile kernels ahead of time| D[cpp_cuda_workflows.rst\nAOT Workflows section]
    B -->|Replay from C++ without Python| E[cpp_cuda_workflows.rst\nAPI Capture Replay section]
    C --> F[wp.func_native in context.py\nExpanded docstring + examples]
    C --> G[differentiability.rst\nAdjoint / replay_snippet guidance]
    D --> H[codegen.rst\nahead_of_time_compilation_workflows label]
    E --> I[runtime.rst\napic_save_load label]
    F --> J[basics.rst / debugging.rst\nAllocators.rst cross-links]
Loading

Reviews (7): Last reviewed commit: "Document C++ and CUDA workflows" | Re-trigger Greptile

Comment thread docs/user_guide/cpp_cuda_workflows.rst
@shi-eric shi-eric force-pushed the ershi/docs-update branch from bea13a6 to 398c868 Compare May 30, 2026 21:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/deep_dive/allocators.rst`:
- Around line 406-407: The section underline for the heading "RAPIDS Memory
Manager (RMM) Integration" is one character short; update the underline (the row
of ~ characters immediately below that heading in docs/deep_dive/allocators.rst)
so it is at least as long as the heading text (add one more ~ to make it 40
characters) so the reStructuredText title underline matches or exceeds the
heading length.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 6579d5fe-1527-4e54-b1bd-da2362eeb29c

📥 Commits

Reviewing files that changed from the base of the PR and between bea13a6 and 398c868.

📒 Files selected for processing (4)
  • docs/deep_dive/allocators.rst
  • docs/user_guide/debugging.rst
  • docs/user_guide/extending_warp.rst
  • warp/_src/context.py
✅ Files skipped from review due to trivial changes (3)
  • warp/_src/context.py
  • docs/user_guide/debugging.rst
  • docs/user_guide/extending_warp.rst

Comment thread docs/deep_dive/allocators.rst
@shi-eric shi-eric force-pushed the ershi/docs-update branch 2 times, most recently from 1cb56c5 to 07d09bc Compare June 1, 2026 23:39
@shi-eric shi-eric changed the title Document Warp extension points Document C++ and CUDA workflows Jun 1, 2026
The branch previously split extension-point docs across several commits.
It also introduced an Advanced Customization page that mixed native
snippets, custom allocators, and logger configuration.

Keep allocator and logger guidance in their topical docs. Use the new
user-guide page for C++ and CUDA workflows: native snippets, AOT
outputs, C++ example entry points, and APIC replay. This gives
non-Python workflows a discoverable home without moving unrelated
customization docs away from the pages users already consult.

Signed-off-by: Eric Shi <ershi@nvidia.com>
@shi-eric shi-eric force-pushed the ershi/docs-update branch from 07d09bc to 3c924f4 Compare June 2, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant