Skip to content

[CI] Auto-generate manifest-diff report from multi-arch CI on PR/push#4908

Open
amd-hsivasun wants to merge 9 commits into
mainfrom
amd/hsivasun/bump-pr-blamelist
Open

[CI] Auto-generate manifest-diff report from multi-arch CI on PR/push#4908
amd-hsivasun wants to merge 9 commits into
mainfrom
amd/hsivasun/bump-pr-blamelist

Conversation

@amd-hsivasun
Copy link
Copy Markdown
Contributor

@amd-hsivasun amd-hsivasun commented Apr 28, 2026

Motivation

Today the manifest-diff report exists as a standalone reusable workflow (#4636), but nothing fires it automatically. Reviewers and bisecters who want to know "what submodules moved between these two TheRock commits" have to dispatch the workflow by hand or run the script locally.

Wiring it into Multi-Arch CI gives every PR and every push to a release-relevant branch a per-event report linked from the run's Step Summary. The job runs as a parallel sibling and has no needs:, and ci_summary does not depend on it — so it can never delay build/test or hold up the workflow's final status. continue-on-error: true keeps a transient API failure from red-lighting the CI run.

Technical Details

Scope. Wired into multi_arch_ci.yml only. ASAN (multi_arch_ci_asan.yml) and release (multi_arch_release.yml, rockrel-driven runs) are intentionally not wired in this PR — see Out of scope below for why and for the issue tracking the follow-ups (#5219).

multi_arch_ci.yml. New top-level sibling job manifest_diff:

  • Has no needs:, so it runs in parallel with linux_build_and_test / windows_build_and_test and does not block them.
  • Gated by if: github.repository == 'ROCm/TheRock'. This is not a fork-PR gate — PRs from forks run in the base-repo context, where github.repository is ROCm/TheRock, so they pass the gate (validated end-to-end on test PR [TEST - DO NOT MERGE] manifest-diff fork-PR coverage validation #5266 — see Test Plan). The gate only skips runs where multi_arch_ci.yml itself is firing inside a downstream fork of TheRock that has Actions enabled. (Same idiom as multi_arch_build_native_linux_packages.yml:143.)

The job calls ./.github/workflows/manifest-diff.yml, and the per-event inputs resolve to a (start, end) commit pair that the script then diffs:

  • pull_request:
    • End commit = pull_request.head.sha (the tip of the PR's source branch — the SHA the PR is actually proposing).
    • Start commit = merge-base of the end commit and pull_request.base.ref. We don't compute this in YAML — we pass the base branch name as pr_base_ref and the script asks the GitHub Compare API for the merge-base. This is rebase-safe: even after the base branch advances or the PR rebases, the report compares against where the PR actually diverged.
  • push:
    • End commit = github.sha (the new tip the branch was just moved to).
    • Start commit = github.event.before (the SHA the branch pointed at before this push — i.e. the previous tip, even on a force-push).
  • branch: only consumed by the script's --find-last-run mode, which the standard PR/push paths don't use. We pass it through (pull_request.base.ref || github.ref_name) so that manual dispatchers who select find_last_run get a sensibly-scoped lookup.

End-to-end pipeline:

flowchart LR
    A[event triggers<br/>multi_arch_ci.yml] --> B{event_name}
    B -- pull_request --> C[end_ref = pull_request.head.sha<br/>pr_base_ref = pull_request.base.ref]
    B -- push --> D[end_ref = github.sha<br/>start_ref = github.event.before]
    C --> E[script: merge_base via<br/>GitHub Compare API → start commit]
    D --> F[script: explicit start commit]
    E --> G[generate_manifest_diff_report.py<br/>diffs start..end]
    F --> G
    G --> H[reports/index.html]
    H --> I[configure_aws_artifacts_credentials<br/>composite action]
    I --> J[upload_test_report_script.py<br/>→ S3 + Step Summary link]
    B -. workflow_dispatch .-> C2[end_ref / pr_base_ref / find_last_run /<br/>start_ref / branch / workflow_mode<br/>passed by dispatcher]
    C2 -.-> G
Loading

manifest-diff.yml converted to TheRock's workflow pattern for fork-PR-safe artifact uploads:

  • runs-on: azure-linux-scale-rocm (was ubuntu-24.04).
  • Container ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:702a5... with -v /runner/config:/home/awsconfig/ mounting baseline AWS credentials.
  • AWS_SHARED_CREDENTIALS_FILE: /home/awsconfig/credentials.ini.
  • Inline aws-actions/configure-aws-credentials step replaced with the existing composite ./.github/actions/configure_aws_artifacts_credentials, which transparently chooses OIDC for base-repo runs and the mounted baseline creds for fork PRs.
  • git config --global --add safe.directory $PWD step after checkout (required for git config --blob calls inside the container, where the working tree is owned by the host UID — without this the report comes back empty without erroring).
  • The inner generate-report job in manifest-diff.yml is marked continue-on-error: true so Compare API hiccups never red-light the parent. The caller manifest_diff job in multi_arch_ci.yml is intentionally not also marked CoE — the inner-job CoE is what propagates to the run's overall conclusion.

The reusable workflow's input surface (start_ref, end_ref, pr_base_ref, branch, find_last_run, accepted_statuses, workflow_mode) is all generic strings — no caller-specific assumptions in the workflow layer.

Script changes (generate_manifest_diff_report.py). The script has three mutually-exclusive modes for resolving the start commit; --end is always required. The new flags slot into a precedence ladder evaluated in resolve_commits():

Flag Role Used by
--end SHA End commit (always required). All paths.
--pr-base-ref BRANCH (new) PR mode. When set, the script asks the GitHub Compare API for merge_base(end, BRANCH) and uses that as the start commit. Rebase-safe. Highest precedence. The CI's pull_request path.
--find-last-run WORKFLOW.yml (new) "Compare to last run" mode. When set (and --pr-base-ref is not), the script queries the GitHub API for the most recent run of WORKFLOW.yml on --branch whose conclusion is in --accepted-statuses, and uses that run's head SHA as the start commit. Returns (None, None) and exits 0 if no matching run exists. Middle precedence. Manual dispatch ("compare to last green nightly").
--accepted-statuses LIST (new) Comma-separated conclusion filter for --find-last-run (default success). Only meaningful with --find-last-run. With --find-last-run.
--branch NAME (new) Branch scope for --find-last-run lookups (default main). Has no effect on the other two modes. With --find-last-run.
--start SHA Explicit start commit. Used only when neither --pr-base-ref nor --find-last-run is set. Lowest precedence. The CI's push path (we pass github.event.before here); ad-hoc bisects.
--workflow-mode Modifier on --start / --end: re-interprets each as a workflow run ID instead of a commit SHA (the script then resolves each to its head SHA). Composes with the precedence ladder by changing how --start/--end are parsed before the start-mode is selected. Manual dispatch comparing two workflow runs.

In short: the CI uses PR mode on pull_request events and explicit-SHAs mode on push events. --find-last-run / --accepted-statuses / --branch exist for the manual-dispatch use cases and are passed through but unused on the standard PR/push paths.

Path filter. Added manifest-diff.yml to _GITHUB_WORKFLOWS_CI_FILENAMES in build_tools/github_actions/configure_ci_path_filters.py so workflow-only PRs still trigger CI.

Docs. New docs/development/manifest_diff.md describing the design, per-event start-ref derivation, manual-dispatch knobs, local CLI usage, and the explicit Scope section linking #5219. Linked from docs/development/ci_overview.md.

Coverage matrix

Scenario Fires? Workflow runs cleanly? Report meaningful?
TheRock-internal PR Yes Yes Yes
TheRock fork PR Yes Yes (composite action skips OIDC, uses mounted baseline creds) Yes
Push to main / multi_arch/** / release/therock-* Yes Yes Yes
rockrel / release / nightly No — those run via multi_arch_release.yml, which we did not modify (and which couldn't pass the right inputs even if we did — see Out of scope #2) N/A N/A

Test Plan

Local.

pre-commit run --all-files
python3 -m pytest \
  build_tools/tests/generate_manifest_diff_report_test.py \
  build_tools/github_actions/tests/github_actions_api_test.py \
  build_tools/github_actions/tests/configure_ci_path_filters_test.py

Both clean.

TheRock-internal PR (this PR). The manifest_diff job runs in parallel with the Linux/Windows build-and-test jobs, completes successfully, detects all 17 submodules, generates the HTML report, uploads to S3, and links it from the run's Step Summary. Verified after adding the safe.directory step — without it the in-container git config --blob calls fail silently and the report comes back empty.

Fork PR. Validated end-to-end on test PR #5266 (head on amd-hsivasun/TheRock, base ROCm/TheRock:main, since closed). The manifest_diff sibling job fired (the if: github.repository == 'ROCm/TheRock' gate passes because the workflow runs in base context), the composite action logged is_pr_from_fork: True and used the baseline-creds path mounted at /home/awsconfig/credentials.ini, and the full job (the six user-defined steps from manifest-diff.yml plus runner setup/post-steps, including the S3 upload) completed in 1m36s. Run · report.

Manual dispatch — --find-last-run mode. Validated end-to-end via workflow_dispatch on manifest-diff.yml@amd/hsivasun/bump-pr-blamelist with find_last_run=multi_arch_ci.yml, branch=main, accepted_statuses=success. The script queried the GitHub API for the most recent successful multi_arch_ci.yml run on main, resolved its head SHA (363e784d) as the start commit, diffed against the branch HEAD (ec0fd682), and uploaded the report to S3. Run · report. Also confirms the --pr-base-ref and --start/--end paths via the same dispatch surface.

Out of scope

These are intentionally not in this PR. Each needs its own design pass before being wired up — all tracked in #5219.

  1. External-repo-aware report (rocm-libraries, rocm-systems). Two distinct gaps here. Workflow level: manifest-diff.yml references ./.github/actions/configure_aws_artifacts_credentials and build_tools/generate_manifest_diff_report.py as local paths, so a cross-repo uses: ROCm/TheRock/.github/workflows/manifest-diff.yml@<ref> from rl/rs would also need to checkout TheRock at the same ref (or those dependencies need to move to a published action / package). Script level: generate_manifest_diff_report.py hardcodes the top-level repo as ROCm/TheRock and reads TheRock's pinned submodule SHAs, so even with the workflow callable cross-repo, an rl/rs PR would either get a Compare API 404 or a misleading "no submodule changes" report. The script-level fix is --repository plus an override-aware manifest mode that consumes the existing external_repo_config plumbing in setup_multi_arch.yml.

  2. rockrel / release / nightly (multi_arch_release.yml). Not just "add a sibling job". multi_arch_release.yml is workflow_call + workflow_dispatch; neither trigger has pull_request.head.sha or github.event.before, and github.sha inside the called workflow resolves to the caller's commit (rockrel's), not a TheRock commit. So a naive copy of the manifest_diff sibling-job pattern would see all of its inputs resolve to either empty or wrong values. To make this work, rockrel would need to plumb explicit TheRock start_ref / end_ref (or a find_last_run baseline like "last successful nightly") through new workflow_call inputs on multi_arch_release.yml, which then forwards them to manifest-diff.yml. This is the design question covered by the same follow-up issue.

  3. ASAN (multi_arch_ci_asan.yml). Closer to rockrel than to a copy/paste. multi_arch_ci_asan.yml triggers on schedule + workflow_dispatch + pull_request — note schedule rather than push. The pull_request arm of the per-event input mapping does port directly, but the schedule arm has no github.event.before or pull_request.base.ref to derive a start commit from, so a literal sibling-job copy/paste would resolve to an empty start commit on nightly runs and the script would exit 0 with no report. The right shape for ASAN is the pull_request mapping plus a find_last_run-based baseline (e.g. "last successful nightly") for the schedule arm. Deferred for scope.

Submission Checklist

@amd-hsivasun amd-hsivasun requested a review from ScottTodd as a code owner April 28, 2026 22:27
@amd-hsivasun amd-hsivasun marked this pull request as draft April 28, 2026 22:27
@amd-hsivasun amd-hsivasun force-pushed the amd/hsivasun/bump-pr-blamelist branch from 424f9bc to 3f3bd59 Compare April 28, 2026 22:46
@amd-hsivasun amd-hsivasun added the manifest-diff Run manifest-diff blame-list on this PR label Apr 28, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Manifest blame list report

View report

@amd-hsivasun amd-hsivasun force-pushed the amd/hsivasun/bump-pr-blamelist branch 3 times, most recently from fc2e67d to fdaf34e Compare April 30, 2026 21:56
@amd-hsivasun amd-hsivasun force-pushed the amd/hsivasun/bump-pr-blamelist branch from fdaf34e to 1b868db Compare May 8, 2026 19:17
@amd-hsivasun amd-hsivasun changed the base branch from amd/hsivasun/integrate-manifest-diff-ci to main May 8, 2026 19:19
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design review before testing of such changes please. I see a few concerning things here (as before on the earlier changes - we should flush all of those during a detailed design review instead of going point by point during draft PR testing like this...)

Comment thread .github/workflows/manifest-diff.yml Outdated
Comment thread build_tools/github_actions/bump_automation.py Outdated
@amd-hsivasun amd-hsivasun force-pushed the amd/hsivasun/bump-pr-blamelist branch from 1b868db to 93efa13 Compare May 12, 2026 18:04
@amd-hsivasun amd-hsivasun changed the title [CI] Add manifest diff report to Bump PRs [CI] Auto-generate manifest-diff report from multi-arch CI on PR/push/schedule May 12, 2026
@amd-hsivasun amd-hsivasun force-pushed the amd/hsivasun/bump-pr-blamelist branch from 82444b3 to c327453 Compare May 12, 2026 23:02
@amd-hsivasun amd-hsivasun requested a review from ScottTodd May 12, 2026 23:04
@amd-hsivasun amd-hsivasun marked this pull request as ready for review May 12, 2026 23:04
@amd-hsivasun amd-hsivasun removed the manifest-diff Run manifest-diff blame-list on this PR label May 13, 2026
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really review this effectively until the workflows actually parse and run with logs to inspect. Please ensure that this minimum bar is met before requesting a review.

Comment thread .github/workflows/setup_multi_arch.yml Outdated
Comment thread .github/workflows/setup_multi_arch.yml Outdated
Comment thread .github/workflows/setup_multi_arch.yml Outdated
@amd-hsivasun amd-hsivasun marked this pull request as draft May 14, 2026 22:32
@amd-hsivasun amd-hsivasun changed the title [CI] Auto-generate manifest-diff report from multi-arch CI on PR/push/schedule [CI] Auto-generate manifest-diff report from multi-arch CI on PR/push May 15, 2026
@amd-hsivasun amd-hsivasun marked this pull request as ready for review May 15, 2026 08:23
@amd-hsivasun amd-hsivasun marked this pull request as draft May 15, 2026 08:25
@amd-hsivasun amd-hsivasun marked this pull request as ready for review May 15, 2026 17:01
@amd-hsivasun amd-hsivasun requested a review from ScottTodd May 15, 2026 21:51
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get a code review from another developer before me. I can't be the first pass reviewer on so many PRs.

(there are SIX reviewers on this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants