Skip to content

feat(torch): add generated operator bases#622

Open
voltjia wants to merge 1 commit into
masterfrom
feat/torch-operator-bases
Open

feat(torch): add generated operator bases#622
voltjia wants to merge 1 commit into
masterfrom
feat/torch-operator-bases

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 27, 2026

Summary

  • Add generated C++ operator base headers under src/base/, regenerated from the current PyTorch codegen output merged in PR feat(torch): expose optional codegen parameters #619.
  • Keep the generated base files flat under src/base/; no generator, test, wrapper, build-system, or CI files are changed in this PR.
  • Omit generated bases whose ATen schemas are known to vary across installed PyTorch builds, so they can remain generated by the local codegen environment instead of frozen as stable public bases.

Motivation

The PyTorch codegen work needs a checked-in operator-base layer that matches the current generator behavior, including optional-parameter overload support from PR #619. This PR contains only the generated public base headers, making the downstream base layer reviewable separately from the generator changes.

Closes # N/A — no dedicated issue.

Type of Change

  • feat — new feature / new operator / new platform.
  • N/A — fix — bug fix.
  • N/A — perf — performance improvement (no behavioral change).
  • N/A — refactor — code restructuring without behavior change.
  • N/A — test — adding or fixing tests only.
  • N/A — docs — documentation only.
  • N/A — build / ci — build system or CI configuration.
  • N/A — chore — tooling, formatting, or other non-code changes.
  • N/A — Breaking change.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • N/A — Build system / CMake / CI; no build-system or CI files are changed.
  • Python bindings / user-facing API

Test Results on Supported Platforms

All rows used a full bare python3 -m pytest -v run, without tests/, --devices, or -n. Each build regenerated PyTorch operator sources first, installed with WITH_TORCH=ON, and smoke-checked representative generated PyTorch operators after install. Build times are from the pip install phase recorded by the local validation runner; pytest times are from the timed pytest command; total time is build + pytest.

Platform Built pytest Result Build Pytest Total Notes / Hardware
NVIDIA Yes 9279 passed, 8565 skipped 1126s 396s 1522s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Iluvatar Yes 7777 passed, 8549 skipped 821s 639s 1460s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
MetaX Yes 8771 passed, 7555 skipped 1448s 436s 1884s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Cambricon Yes 5974 passed, 9968 skipped 2308s 999s 3307s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Moore Yes 8537 passed, 7807 skipped 2322s 671s 2993s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Ascend Yes 7454 passed, 8830 skipped 1125s 615s 1740s Full bare pytest. PyTorch backend compiled and generated torch-op tests were included.
Full `pytest` output (optional)
NVIDIA:    9279 passed, 8565 skipped in 390.66s (0:06:30)
Iluvatar:  7777 passed, 8549 skipped in 635.02s (0:10:35)
MetaX:     8771 passed, 7555 skipped in 418.39s (0:06:58)
Cambricon: 5974 passed, 9968 skipped in 990.75s (0:16:30)
Moore:     8537 passed, 7807 skipped in 662.66s (0:11:02)
Ascend:    7454 passed, 8830 skipped in 597.06s (0:09:57)

The test counts are expected to match the PyTorch codegen coverage from PR #619 because this PR only checks in generated base headers from that generator. The only observed difference from the latest PR #619 table is on Ascend: one tests/test_torch_ops.py inner case is skipped instead of passed:

tests/test_torch_ops.py::test_op[npu-dtype1-0.01-0.01-13x4-inner]

The generated inner base, binding metadata, and PyTorch backend source are identical between PR #619's generated output and this PR's checked-in base. Ascend still builds successfully, smoke checks show the PyTorch slot active for Ascend, and the full pytest run exits successfully. This is recorded as a non-blocking skip-count drift rather than a build or execution regression.

Benchmark / Performance Impact

N/A — this PR checks in generated base headers only. The table above records build and test wall time for each platform.

Notes for Reviewers

This PR is rebased on the latest master, after PR #619 was merged. The generated base files are intentionally checked in as generator output. File paths are kept flat under src/base/.

The generated bases intentionally omit src/base/all.h, src/base/any.h, and src/base/internal_scaled_mm.h in this PR because their ATen schemas vary across installed PyTorch builds; those forms are better regenerated by the local codegen environment instead of frozen as stable public bases.


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional, documented in this PR, and reflected in affected callers/tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific

  • Code follows the Google C++ Style Guide strictly.
  • clang-format --dry-run --Werror passes on all modified src/base/*.h files.
  • clang-tidy concerns (per .clang-tidy) have been reviewed — no new warnings beyond the existing baseline.
  • Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
  • No exceptions are thrown. Error paths use assert with messages that include at least __FILE__, __LINE__, and __func__ (CONTRIBUTING.md §C++).
  • N/A — No new C++ error or warning message was added.
  • N/A — Kernel files are named correctly; this PR adds operator bases, not kernels.
  • N/A — Kernel and kernel launcher separation is unchanged; this PR adds operator bases, not kernels.
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • Exactly one blank line between members within a class (CONTRIBUTING.md §C++).
  • Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
  • New operators added via src/base/<op>.h (inheriting Operator<Op>) with generated PyTorch backends provided by PR feat(torch): expose optional codegen parameters #619 (CONTRIBUTING.md §Adding an Operator).
  • No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific

  • N/A — This PR does not modify Python files.

Testing

  • pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
  • N/A — Every supported platform was tested.
  • New functionality is covered by PR feat(torch): expose optional codegen parameters #619's generated PyTorch operator test harness and the all-platform full pytest runs recorded above.
  • N/A — This PR does not add Python tests.
  • N/A — This PR does not add flaky parallel-only tests.
  • N/A — This is not a bug-fix-only PR.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory on every supported platform listed above.
  • compile_commands.json still regenerates through the existing CMake/scikit-build configuration path.
  • N/A — No new backend / device was added.
  • Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
  • Both CI workflows (clang-format.yml, ruff.yml) are expected to remain green; this PR only changes generated C++ headers under src/base/.
  • N/A — No new runtime dependency was added.

Documentation

  • N/A — No user workflow, build flag, or developer workflow documentation changed.
  • New generated operator bases are documented through their checked-in header signatures.
  • N/A — No user-visible breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, IP addresses, or personal hardware identifiers have been committed or included in this PR description.
  • N/A — No third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9444f9c to 9864ff2 Compare May 27, 2026 19:51
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 33e537c to ffc3d68 Compare May 27, 2026 19:54
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9864ff2 to c0db647 Compare May 27, 2026 20:27
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from ffc3d68 to d89ce8e Compare May 27, 2026 20:28
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from c0db647 to 3e3e319 Compare May 27, 2026 21:15
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from fe50963 to c5a3a38 Compare May 27, 2026 21:51
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from c5a3a38 to 312cd42 Compare May 27, 2026 22:25
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 3e3e319 to 2a5d6af Compare May 27, 2026 23:33
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from 34db70e to f5f6a15 Compare May 28, 2026 03:39
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from d41f01d to 9f591db Compare May 28, 2026 03:55
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from f5f6a15 to ee42c3c Compare May 28, 2026 03:56
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 9f591db to 70094a1 Compare May 28, 2026 07:41
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from ee42c3c to 9299ffb Compare May 28, 2026 07:44
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 70094a1 to 87e86ab Compare May 28, 2026 08:02
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 9299ffb to 1c61728 Compare May 28, 2026 08:04
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 87e86ab to e62e2b2 Compare June 1, 2026 07:55
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 1c61728 to 63a85dc Compare June 1, 2026 07:55
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from e62e2b2 to 48a3f2c Compare June 1, 2026 08:17
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 63a85dc to 846a477 Compare June 1, 2026 08:17
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 48a3f2c to 5582e8a Compare June 1, 2026 10:53
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from 846a477 to 19cb477 Compare June 1, 2026 11:02
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 5582e8a to 4e3cd58 Compare June 2, 2026 08:22
@voltjia voltjia force-pushed the feat/torch-operator-bases branch 2 times, most recently from 4b857e1 to d343d02 Compare June 2, 2026 09:21
@voltjia voltjia force-pushed the feat/torch-operator-bases branch from d343d02 to e0e57a9 Compare June 2, 2026 14:01
@voltjia voltjia changed the base branch from feat/torch-codegen-optional-overloads to master June 2, 2026 14:03
@voltjia voltjia requested a review from a team June 2, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants