[ROCm] [Release] Block rocm release pipeline from running at every commit and fix ECR limit issue#37671
[ROCm] [Release] Block rocm release pipeline from running at every commit and fix ECR limit issue#37671tjtanaa wants to merge 8 commits into
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
There was a problem hiding this comment.
Code Review
This pull request aims to prevent the ROCm release pipeline from running on every commit by removing a condition that makes an input step unconditional, thus blocking the pipeline. My review points out that this is likely a workaround for an underlying issue where subsequent steps are not correctly conditioned. I've recommended addressing the root cause for a more maintainable and clearer pipeline configuration, which I've classified as a high-severity issue.
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
Closing PR and reverting all changes |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Fix 1:
Currently the ROCm release pipeline is not ready for running at every commit. We always block the pipeline from running and manually trigger the pipeline when we are going to release the docker image.
It will be enabled through the following PR #37283 .
Fix 2:
On AWS ECR, there is a hard limit to the number of tags we can create for a single image as shown in https://buildkite.com/vllm/release/builds/14204/steps/canvas?sid=019d0af8-0c96-40c8-8b67-cda0716d4524&tab=output
Error message:
exceeds the maximum allowed number of tags per image which is '1000'The fix is to always reuse the same tag for the same rocm base image.
public.ecr.aws/q9t5s3a7/vllm-release-repo:$${CACHE_KEY}-rocm-baseThis will be a no-op if the tag and the docker image hash are the same.
The
annotate-release.shandannotate-rocm-release.shhas been updated to reflect this change. (Instructing maintainer to pullpublic.ecr.aws/q9t5s3a7/vllm-release-repo:$${CACHE_KEY}-rocm-baseinstead.Output of
annotate-release.sh: https://buildkite.com/vllm/release-pipeline-shadow/builds/3261/annotationsTest Plan
Test Result
Build validated the changes for fix 2: https://buildkite.com/vllm/release-pipeline-shadow/builds/3261/steps/canvas
The annotation hash key is
b58dc988fa0856d2-9d3bce57-rocm-baseand ROCm docker image built areb58dc988fa0856d2-9d3bce57-rocm-baseEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)