[CI] Auto-generate manifest-diff report from multi-arch CI on PR/push#4908
Open
amd-hsivasun wants to merge 9 commits into
Open
[CI] Auto-generate manifest-diff report from multi-arch CI on PR/push#4908amd-hsivasun wants to merge 9 commits into
amd-hsivasun wants to merge 9 commits into
Conversation
424f9bc to
3f3bd59
Compare
Manifest blame list report |
fc2e67d to
fdaf34e
Compare
fdaf34e to
1b868db
Compare
ScottTodd
requested changes
May 8, 2026
Member
ScottTodd
left a comment
There was a problem hiding this comment.
Design review before testing of such changes please. I see a few concerning things here (as before on the earlier changes - we should flush all of those during a detailed design review instead of going point by point during draft PR testing like this...)
1b868db to
93efa13
Compare
82444b3 to
c327453
Compare
ScottTodd
requested changes
May 13, 2026
Member
ScottTodd
left a comment
There was a problem hiding this comment.
I can't really review this effectively until the workflows actually parse and run with logs to inspect. Please ensure that this minimum bar is met before requesting a review.
ScottTodd
reviewed
May 18, 2026
Member
ScottTodd
left a comment
There was a problem hiding this comment.
Please get a code review from another developer before me. I can't be the first pass reviewer on so many PRs.
(there are SIX reviewers on this PR)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Today the manifest-diff report exists as a standalone reusable workflow (#4636), but nothing fires it automatically. Reviewers and bisecters who want to know "what submodules moved between these two TheRock commits" have to dispatch the workflow by hand or run the script locally.
Wiring it into Multi-Arch CI gives every PR and every push to a release-relevant branch a per-event report linked from the run's Step Summary. The job runs as a parallel sibling and has no
needs:, andci_summarydoes not depend on it — so it can never delay build/test or hold up the workflow's final status.continue-on-error: truekeeps a transient API failure from red-lighting the CI run.Technical Details
Scope. Wired into
multi_arch_ci.ymlonly. ASAN (multi_arch_ci_asan.yml) and release (multi_arch_release.yml, rockrel-driven runs) are intentionally not wired in this PR — see Out of scope below for why and for the issue tracking the follow-ups (#5219).multi_arch_ci.yml. New top-level sibling jobmanifest_diff:needs:, so it runs in parallel withlinux_build_and_test/windows_build_and_testand does not block them.if: github.repository == 'ROCm/TheRock'. This is not a fork-PR gate — PRs from forks run in the base-repo context, wheregithub.repositoryisROCm/TheRock, so they pass the gate (validated end-to-end on test PR [TEST - DO NOT MERGE] manifest-diff fork-PR coverage validation #5266 — see Test Plan). The gate only skips runs wheremulti_arch_ci.ymlitself is firing inside a downstream fork of TheRock that has Actions enabled. (Same idiom asmulti_arch_build_native_linux_packages.yml:143.)The job calls
./.github/workflows/manifest-diff.yml, and the per-event inputs resolve to a (start, end) commit pair that the script then diffs:pull_request:pull_request.head.sha(the tip of the PR's source branch — the SHA the PR is actually proposing).pull_request.base.ref. We don't compute this in YAML — we pass the base branch name aspr_base_refand the script asks the GitHub Compare API for the merge-base. This is rebase-safe: even after the base branch advances or the PR rebases, the report compares against where the PR actually diverged.push:github.sha(the new tip the branch was just moved to).github.event.before(the SHA the branch pointed at before this push — i.e. the previous tip, even on a force-push).branch: only consumed by the script's--find-last-runmode, which the standard PR/push paths don't use. We pass it through (pull_request.base.ref || github.ref_name) so that manual dispatchers who selectfind_last_runget a sensibly-scoped lookup.End-to-end pipeline:
flowchart LR A[event triggers<br/>multi_arch_ci.yml] --> B{event_name} B -- pull_request --> C[end_ref = pull_request.head.sha<br/>pr_base_ref = pull_request.base.ref] B -- push --> D[end_ref = github.sha<br/>start_ref = github.event.before] C --> E[script: merge_base via<br/>GitHub Compare API → start commit] D --> F[script: explicit start commit] E --> G[generate_manifest_diff_report.py<br/>diffs start..end] F --> G G --> H[reports/index.html] H --> I[configure_aws_artifacts_credentials<br/>composite action] I --> J[upload_test_report_script.py<br/>→ S3 + Step Summary link] B -. workflow_dispatch .-> C2[end_ref / pr_base_ref / find_last_run /<br/>start_ref / branch / workflow_mode<br/>passed by dispatcher] C2 -.-> Gmanifest-diff.ymlconverted to TheRock's workflow pattern for fork-PR-safe artifact uploads:runs-on: azure-linux-scale-rocm(wasubuntu-24.04).ghcr.io/rocm/therock_build_manylinux_x86_64@sha256:702a5...with-v /runner/config:/home/awsconfig/mounting baseline AWS credentials.AWS_SHARED_CREDENTIALS_FILE: /home/awsconfig/credentials.ini.aws-actions/configure-aws-credentialsstep replaced with the existing composite./.github/actions/configure_aws_artifacts_credentials, which transparently chooses OIDC for base-repo runs and the mounted baseline creds for fork PRs.git config --global --add safe.directory $PWDstep after checkout (required forgit config --blobcalls inside the container, where the working tree is owned by the host UID — without this the report comes back empty without erroring).generate-reportjob inmanifest-diff.ymlis markedcontinue-on-error: trueso Compare API hiccups never red-light the parent. The callermanifest_diffjob inmulti_arch_ci.ymlis intentionally not also marked CoE — the inner-job CoE is what propagates to the run's overall conclusion.The reusable workflow's input surface (
start_ref,end_ref,pr_base_ref,branch,find_last_run,accepted_statuses,workflow_mode) is all generic strings — no caller-specific assumptions in the workflow layer.Script changes (
generate_manifest_diff_report.py). The script has three mutually-exclusive modes for resolving the start commit;--endis always required. The new flags slot into a precedence ladder evaluated inresolve_commits():--end SHA--pr-base-ref BRANCH(new)merge_base(end, BRANCH)and uses that as the start commit. Rebase-safe. Highest precedence.pull_requestpath.--find-last-run WORKFLOW.yml(new)--pr-base-refis not), the script queries the GitHub API for the most recent run ofWORKFLOW.ymlon--branchwhose conclusion is in--accepted-statuses, and uses that run's head SHA as the start commit. Returns(None, None)and exits 0 if no matching run exists. Middle precedence.--accepted-statuses LIST(new)--find-last-run(defaultsuccess). Only meaningful with--find-last-run.--find-last-run.--branch NAME(new)--find-last-runlookups (defaultmain). Has no effect on the other two modes.--find-last-run.--start SHA--pr-base-refnor--find-last-runis set. Lowest precedence.pushpath (we passgithub.event.beforehere); ad-hoc bisects.--workflow-mode--start/--end: re-interprets each as a workflow run ID instead of a commit SHA (the script then resolves each to its head SHA). Composes with the precedence ladder by changing how--start/--endare parsed before the start-mode is selected.In short: the CI uses PR mode on
pull_requestevents and explicit-SHAs mode onpushevents.--find-last-run/--accepted-statuses/--branchexist for the manual-dispatch use cases and are passed through but unused on the standard PR/push paths.Path filter. Added
manifest-diff.ymlto_GITHUB_WORKFLOWS_CI_FILENAMESinbuild_tools/github_actions/configure_ci_path_filters.pyso workflow-only PRs still trigger CI.Docs. New
docs/development/manifest_diff.mddescribing the design, per-event start-ref derivation, manual-dispatch knobs, local CLI usage, and the explicit Scope section linking #5219. Linked fromdocs/development/ci_overview.md.Coverage matrix
main/multi_arch/**/release/therock-*rockrel/ release / nightlymulti_arch_release.yml, which we did not modify (and which couldn't pass the right inputs even if we did — see Out of scope #2)Test Plan
Local.
Both clean.
TheRock-internal PR (this PR). The
manifest_diffjob runs in parallel with the Linux/Windows build-and-test jobs, completes successfully, detects all 17 submodules, generates the HTML report, uploads to S3, and links it from the run's Step Summary. Verified after adding thesafe.directorystep — without it the in-containergit config --blobcalls fail silently and the report comes back empty.Fork PR. Validated end-to-end on test PR #5266 (head on
amd-hsivasun/TheRock, baseROCm/TheRock:main, since closed). Themanifest_diffsibling job fired (theif: github.repository == 'ROCm/TheRock'gate passes because the workflow runs in base context), the composite action loggedis_pr_from_fork: Trueand used the baseline-creds path mounted at/home/awsconfig/credentials.ini, and the full job (the six user-defined steps frommanifest-diff.ymlplus runner setup/post-steps, including the S3 upload) completed in 1m36s. Run · report.Manual dispatch —
--find-last-runmode. Validated end-to-end viaworkflow_dispatchonmanifest-diff.yml@amd/hsivasun/bump-pr-blamelistwithfind_last_run=multi_arch_ci.yml,branch=main,accepted_statuses=success. The script queried the GitHub API for the most recent successfulmulti_arch_ci.ymlrun onmain, resolved its head SHA (363e784d) as the start commit, diffed against the branch HEAD (ec0fd682), and uploaded the report to S3. Run · report. Also confirms the--pr-base-refand--start/--endpaths via the same dispatch surface.Out of scope
These are intentionally not in this PR. Each needs its own design pass before being wired up — all tracked in #5219.
External-repo-aware report (
rocm-libraries,rocm-systems). Two distinct gaps here. Workflow level:manifest-diff.ymlreferences./.github/actions/configure_aws_artifacts_credentialsandbuild_tools/generate_manifest_diff_report.pyas local paths, so a cross-repouses: ROCm/TheRock/.github/workflows/manifest-diff.yml@<ref>from rl/rs would also need to checkout TheRock at the same ref (or those dependencies need to move to a published action / package). Script level:generate_manifest_diff_report.pyhardcodes the top-level repo asROCm/TheRockand reads TheRock's pinned submodule SHAs, so even with the workflow callable cross-repo, anrl/rsPR would either get a Compare API 404 or a misleading "no submodule changes" report. The script-level fix is--repositoryplus an override-aware manifest mode that consumes the existingexternal_repo_configplumbing insetup_multi_arch.yml.rockrel/ release / nightly (multi_arch_release.yml). Not just "add a sibling job".multi_arch_release.ymlisworkflow_call+workflow_dispatch; neither trigger haspull_request.head.shaorgithub.event.before, andgithub.shainside the called workflow resolves to the caller's commit (rockrel's), not a TheRock commit. So a naive copy of themanifest_diffsibling-job pattern would see all of its inputs resolve to either empty or wrong values. To make this work, rockrel would need to plumb explicit TheRockstart_ref/end_ref(or afind_last_runbaseline like "last successful nightly") through newworkflow_callinputs onmulti_arch_release.yml, which then forwards them tomanifest-diff.yml. This is the design question covered by the same follow-up issue.ASAN (
multi_arch_ci_asan.yml). Closer to rockrel than to a copy/paste.multi_arch_ci_asan.ymltriggers onschedule+workflow_dispatch+pull_request— noteschedulerather thanpush. Thepull_requestarm of the per-event input mapping does port directly, but theschedulearm has nogithub.event.beforeorpull_request.base.refto derive a start commit from, so a literal sibling-job copy/paste would resolve to an empty start commit on nightly runs and the script would exit 0 with no report. The right shape for ASAN is thepull_requestmapping plus afind_last_run-based baseline (e.g. "last successful nightly") for theschedulearm. Deferred for scope.Submission Checklist