Skip to content

ci: fix platform-specific builds#630

Draft
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:ci/platform-build-fixes
Draft

ci: fix platform-specific builds#630
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:ci/platform-build-fixes

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Summary

  • Make each hardware CI setup explicitly select WITH_CPU=ON plus its target WITH_<PLATFORM>=ON with AUTO_DETECT_DEVICES=OFF.
  • Add a dedicated Ninja job pool for generated operator_call_instantiations_*.cc sources.
  • Link the Moore build against an available LLVM/OpenMP runtime when present to satisfy __kmpc_* symbols from Moore-compiled objects.

Motivation

Several PRs based on 3371c14 expose platform failures that are not caused by the operator PR diffs:

  • Iluvatar CI can auto-detect both NVIDIA and Iluvatar devices, then fail CMake's mutually exclusive GPU backend check.
  • NVIDIA CI can have multiple generated operator_call_instantiations_*.cc CUDA translation units compile concurrently with pybind binding unity sources after public C++ operator call instantiations were added.
  • Moore CI can build successfully but fail test collection when importing infini.ops with undefined symbol: __kmpc_for_static_fini.

Closes #

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA Pending CI Validates generated operator-call instantiation job pool.
Iluvatar Pending CI Validates explicit platform selection.
MetaX Pending CI Validates explicit platform selection.
Cambricon Pending CI Validates explicit platform selection.
Moore Pending CI Validates explicit platform selection and OpenMP runtime linkage.
Ascend Pending CI Validates explicit platform selection.
Full `pytest` output (optional)
Pending hardware CI.

Benchmark / Performance Impact

N/A. The generated operator-call instantiation job pool only caps the heavyweight generated instantiation sources; it does not globally limit the whole build.

Notes for Reviewers

This PR intentionally avoids setting global build parallelism through CMAKE_BUILD_PARALLEL_LEVEL or MAX_JOBS. The concurrency cap is scoped to generated operator-call instantiation sources, matching the new heavy source class added by public C++ operator-call instantiation generation.


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • N/A. No public API changes.

General Code Hygiene (applies to all languages)

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

N/A. No C++ source files changed.

Python Specific (if Python files changed)

N/A. No Python files changed.

Testing

  • pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
  • For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
  • N/A. No tests or operator functionality were added.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory with pip install .[dev] on at least one affected platform.
  • compile_commands.json still regenerates (CMake option CMAKE_EXPORT_COMPILE_COMMANDS=ON in pyproject.toml — required by the code-lint skill and clang-tidy -p).
  • N/A. No new backends or devices were added.
  • Only one CUDA-like GPU backend is selectable at a time — this PR makes CI setup select one explicitly.
  • Both CI workflows (clang-format.yml, ruff.yml) are green locally (or expected to be green on CI).
  • No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies] (or justified in the PR description).

Documentation

N/A. This change adjusts CI/build plumbing only.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • Third-party code is license-compatible and attributed.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@zhangyue207 zhangyue207 force-pushed the ci/platform-build-fixes branch 6 times, most recently from 2b6a938 to 2d1da72 Compare June 2, 2026 06:35
@zhangyue207 zhangyue207 force-pushed the ci/platform-build-fixes branch from 2d1da72 to 55db437 Compare June 2, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant