Skip to content

[ROCm] [Release] Block rocm release pipeline from running at every commit and fix ECR limit issue#37671

Open
tjtanaa wants to merge 8 commits into
vllm-project:mainfrom
tjtanaa:blocknightlypipeline
Open

[ROCm] [Release] Block rocm release pipeline from running at every commit and fix ECR limit issue#37671
tjtanaa wants to merge 8 commits into
vllm-project:mainfrom
tjtanaa:blocknightlypipeline

Conversation

@tjtanaa
Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa commented Mar 20, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix 1:

Currently the ROCm release pipeline is not ready for running at every commit. We always block the pipeline from running and manually trigger the pipeline when we are going to release the docker image.

It will be enabled through the following PR #37283 .

Fix 2:
On AWS ECR, there is a hard limit to the number of tags we can create for a single image as shown in https://buildkite.com/vllm/release/builds/14204/steps/canvas?sid=019d0af8-0c96-40c8-8b67-cda0716d4524&tab=output

Error message: exceeds the maximum allowed number of tags per image which is '1000'

The fix is to always reuse the same tag for the same rocm base image. public.ecr.aws/q9t5s3a7/vllm-release-repo:$${CACHE_KEY}-rocm-base

This will be a no-op if the tag and the docker image hash are the same.

The annotate-release.sh and annotate-rocm-release.sh has been updated to reflect this change. (Instructing maintainer to pull public.ecr.aws/q9t5s3a7/vllm-release-repo:$${CACHE_KEY}-rocm-base instead.

Output of annotate-release.sh: https://buildkite.com/vllm/release-pipeline-shadow/builds/3261/annotations

Test Plan

Test Result

Build validated the changes for fix 2: https://buildkite.com/vllm/release-pipeline-shadow/builds/3261/steps/canvas

The annotation hash key is b58dc988fa0856d2-9d3bce57-rocm-base and ROCm docker image built are b58dc988fa0856d2-9d3bce57-rocm-base


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels Mar 20, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Mar 20, 2026
@tjtanaa tjtanaa requested a review from khluu March 20, 2026 10:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to prevent the ROCm release pipeline from running on every commit by removing a condition that makes an input step unconditional, thus blocking the pipeline. My review points out that this is likely a workaround for an underlying issue where subsequent steps are not correctly conditioned. I've recommended addressing the root cause for a more maintainable and clearer pipeline configuration, which I've classified as a high-severity issue.

Comment thread .buildkite/release-pipeline.yaml
tjtanaa added 3 commits March 20, 2026 11:35
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa changed the title [ROCm] [Release] Block rocm release pipeline from running at every commit [ROCm] [Release] Block rocm release pipeline from running at every commit and fix ECR limit issue Mar 20, 2026
tjtanaa added 4 commits March 20, 2026 12:34
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa added this to the v0.18.0 cherry picks milestone Mar 20, 2026
@harshitgavita-07
Copy link
Copy Markdown

Closing PR and reverting all changes

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 26, 2026
@mergify mergify Bot removed the needs-rebase label May 7, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 7, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants