-
-
Notifications
You must be signed in to change notification settings - Fork 17.3k
[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm #37283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 14 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
c194db4
preliminary attempt on nightly rocm docker
0fee22f
preliminary attempt on nightly rocm docker
b8186cb
fix release branch
tjtanaa 80a178c
Merge remote-tracking branch 'origin/main' into nightly-rocm
tjtanaa 704696f
use the ECR to download docker image instead
tjtanaa 70467a2
resolve 2990518
tjtanaa 97d0acc
setup for mock release
tjtanaa 36c72b2
fix syntax error
tjtanaa 8c9b340
remove redundant docker pull
tjtanaa 5600a99
only download wheels
tjtanaa af83bb8
add dry run
tjtanaa 63504ae
fix denied adding to an image in the repository with name 'vllm-relea…
tjtanaa 8d24732
fix denied adding to an image in the repository with name 'vllm-relea…
tjtanaa 0e4701f
sync main
tjtanaa 2f1ebf7
add logs wto the tag deletion op for debugging
tjtanaa 92a8422
debug why tag is not delete
tjtanaa c86facc
do not create new ECR tag for base
tjtanaa fb27d75
fix base image to always use the tag ECR_IMAGE_TAG
tjtanaa 96434bd
make the PR ready
tjtanaa fca0e1a
clean up lines
tjtanaa 4695ecc
Merge remote-tracking branch 'origin/main' into nightly-rocm
tjtanaa 57e2078
remove the PYTHON_VERSION and PYTORCH_ROCM_ARCH extraction logic
tjtanaa d3c6330
make the PR ready for review
tjtanaa d3fd8fe
remove dry run
tjtanaa ebaa629
fix comments
tjtanaa d72adca
Merge remote-tracking branch 'origin/main' into nightly-rocm
tjtanaa 4d942e4
change to use small_cpu_queue_release
tjtanaa a2f2e43
Merge branch 'main' into nightly-rocm
tjtanaa 1319b1c
Merge branch 'main' into nightly-rocm
tjtanaa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| #!/bin/bash | ||
| # Clean up old per-commit rocm-base tags from ECR Public, keeping a rolling | ||
| # window of the most recent N commits' tags plus the cache key tag. | ||
| # | ||
| # Usage: cleanup-ecr-rocm-base-tags.sh <ecr-image-ref> [window-size] | ||
| # ecr-image-ref: full ECR reference of the base image (cache-key tag to preserve) | ||
| # window-size: number of recent commit tags to keep (default 300) | ||
| set -euo pipefail | ||
|
|
||
| ECR_IMAGE_REF="${1:?Usage: $0 <ecr-image-ref> [window-size]}" | ||
| WINDOW_SIZE="${2:-300}" | ||
| REPO_NAME="vllm-release-repo" | ||
| REGION="us-east-1" | ||
|
|
||
| # Extract the cache key tag (always preserved) | ||
| CACHE_TAG="${ECR_IMAGE_REF##*:}" | ||
|
|
||
| # Get image digest from the locally-pulled image | ||
| DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' "$ECR_IMAGE_REF" | awk -F@ '{print $2}') | ||
| if [ -z "$DIGEST" ]; then | ||
| echo "WARNING: Could not get digest for $ECR_IMAGE_REF, skipping cleanup" | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Get all tags for this specific digest from ECR | ||
| IMAGE_DETAIL=$(aws ecr-public describe-images \ | ||
| --repository-name "$REPO_NAME" \ | ||
| --region "$REGION" \ | ||
| --image-ids imageDigest="$DIGEST" \ | ||
| --output json 2>/dev/null || echo '{"imageDetails":[]}') | ||
|
|
||
| # Extract all -rocm-base tags (excluding the cache key tag) | ||
| COMMIT_BASE_TAGS=$(echo "$IMAGE_DETAIL" | jq -r \ | ||
| --arg cache_tag "$CACHE_TAG" \ | ||
| '.imageDetails[0].imageTags[]? // empty | ||
| | select(endswith("-rocm-base")) | ||
| | select(. != $cache_tag)') | ||
|
|
||
| TAG_COUNT=$(echo "$COMMIT_BASE_TAGS" | grep -c . || true) | ||
| echo "Found $TAG_COUNT per-commit rocm-base tags (plus cache key tag: $CACHE_TAG)" | ||
|
|
||
| if [ "$TAG_COUNT" -le "$WINDOW_SIZE" ]; then | ||
| echo "Within window ($WINDOW_SIZE), no cleanup needed" | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Get the most recent N commit SHAs from git history | ||
| RECENT_COMMITS=$(git log --format=%H -n "$WINDOW_SIZE" 2>/dev/null | sort) | ||
| if [ -z "$RECENT_COMMITS" ]; then | ||
| echo "WARNING: Could not get git history, skipping cleanup" | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Identify tags to delete: commit SHA not in recent history | ||
| TAGS_TO_DELETE="" | ||
| KEEP_COUNT=0 | ||
| DELETE_COUNT=0 | ||
| while IFS= read -r tag; do | ||
| [ -z "$tag" ] && continue | ||
| COMMIT_SHA="${tag%-rocm-base}" | ||
| if echo "$RECENT_COMMITS" | grep -q "^${COMMIT_SHA}$"; then | ||
| KEEP_COUNT=$((KEEP_COUNT + 1)) | ||
| else | ||
| TAGS_TO_DELETE="${TAGS_TO_DELETE}${tag}"$'\n' | ||
| DELETE_COUNT=$((DELETE_COUNT + 1)) | ||
| fi | ||
| done <<< "$COMMIT_BASE_TAGS" | ||
|
|
||
| echo "Keeping $KEEP_COUNT tags (recent commits), deleting $DELETE_COUNT old tags" | ||
|
|
||
| if [ "$DELETE_COUNT" -eq 0 ]; then | ||
| echo "Nothing to delete" | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Delete in batches of 100 (ECR batch-delete-image limit) | ||
| echo "$TAGS_TO_DELETE" | grep -v '^$' | while mapfile -t -n 100 BATCH && [ ${#BATCH[@]} -gt 0 ]; do | ||
| IMAGE_IDS="" | ||
| for tag in "${BATCH[@]}"; do | ||
| [ -z "$tag" ] && continue | ||
| IMAGE_IDS="$IMAGE_IDS imageTag=$tag" | ||
| done | ||
| if [ -n "$IMAGE_IDS" ]; then | ||
| aws ecr-public batch-delete-image \ | ||
| --repository-name "$REPO_NAME" \ | ||
| --region "$REGION" \ | ||
| --image-ids $IMAGE_IDS 2>/dev/null || echo "WARNING: batch-delete failed for some tags" | ||
| fi | ||
| done | ||
|
|
||
| echo "Cleanup complete: deleted $DELETE_COUNT old rocm-base tags, kept $KEEP_COUNT + cache key" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| #!/bin/bash | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| # | ||
| # Push ROCm nightly base image and nightly image from ECR | ||
| # to Docker Hub as vllm/vllm-openai-rocm:nightly-base and vllm/vllm-openai-rocm:nightly | ||
| # and vllm/vllm-openai-rocm:base-nightly-<commit> and vllm/vllm-openai-rocm:nightly-<commit>. | ||
| # Run when NIGHTLY=1 after build-rocm-release-image has pushed to ECR. | ||
| # | ||
| # Local testing (no push to Docker Hub): | ||
| # BUILDKITE_COMMIT=<commit-with-rocm-image-in-ecr> DRY_RUN=1 bash .buildkite/scripts/push-nightly-builds-rocm.sh | ||
| # Requires: AWS CLI configured (for ECR public login), Docker. For full run: Docker Hub login. | ||
|
|
||
| set -ex | ||
|
|
||
| # Use BUILDKITE_COMMIT from env (required; set to a commit that has ROCm image in ECR for local test) | ||
| BUILDKITE_COMMIT="${BUILDKITE_COMMIT:?Set BUILDKITE_COMMIT to the commit SHA that has the ROCm image in ECR (e.g. from a previous release pipeline run)}" | ||
| DRY_RUN="${DRY_RUN:-0}" | ||
|
tjtanaa marked this conversation as resolved.
|
||
|
|
||
| BASE_ORIG_TAG="${BUILDKITE_COMMIT}-rocm-base" | ||
| ORIG_TAG="${BUILDKITE_COMMIT}-rocm" | ||
| BASE_TAG_NAME="base-nightly" | ||
| TAG_NAME="nightly" | ||
| BASE_TAG_NAME_COMMIT="base-nightly-${BUILDKITE_COMMIT}" | ||
| TAG_NAME_COMMIT="nightly-${BUILDKITE_COMMIT}" | ||
|
|
||
| echo "Pushing ROCm image from ECR tag $ORIG_TAG to Docker Hub as $TAG_NAME and $TAG_NAME_COMMIT" | ||
| [[ "$DRY_RUN" == "1" ]] && echo "[DRY_RUN] Skipping push to Docker Hub" | ||
|
|
||
| # Login to ECR and pull the image built by build-rocm-release-image | ||
| aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7 | ||
| docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:"$BASE_ORIG_TAG" | ||
| docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:"$ORIG_TAG" | ||
|
|
||
| # Tag for Docker Hub (base-nightly and nightly-base, nightly and nightly-<commit>) | ||
| docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:"$BASE_ORIG_TAG" vllm/vllm-openai-rocm:"$BASE_TAG_NAME" | ||
| docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:"$BASE_ORIG_TAG" vllm/vllm-openai-rocm:"$BASE_TAG_NAME_COMMIT" | ||
| docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:"$ORIG_TAG" vllm/vllm-openai-rocm:"$TAG_NAME" | ||
| docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:"$ORIG_TAG" vllm/vllm-openai-rocm:"$TAG_NAME_COMMIT" | ||
|
|
||
| if [[ "$DRY_RUN" == "1" ]]; then | ||
| echo "[DRY_RUN] Would push vllm/vllm-openai-rocm:$BASE_TAG_NAME and vllm/vllm-openai-rocm:$BASE_TAG_NAME_COMMIT" | ||
| echo "[DRY_RUN] Would push vllm/vllm-openai-rocm:$TAG_NAME and vllm/vllm-openai-rocm:$TAG_NAME_COMMIT" | ||
| echo "[DRY_RUN] Local tags created. Exiting without push." | ||
| exit 0 | ||
| fi | ||
|
|
||
| # Push to Docker Hub (docker-login plugin runs before this step in CI) | ||
| docker push vllm/vllm-openai-rocm:"$BASE_TAG_NAME" | ||
| docker push vllm/vllm-openai-rocm:"$BASE_TAG_NAME_COMMIT" | ||
| docker push vllm/vllm-openai-rocm:"$TAG_NAME" | ||
| docker push vllm/vllm-openai-rocm:"$TAG_NAME_COMMIT" | ||
|
|
||
| echo "Pushed vllm/vllm-openai-rocm:$BASE_TAG_NAME and vllm/vllm-openai-rocm:$BASE_TAG_NAME_COMMIT" | ||
| echo "Pushed vllm/vllm-openai-rocm:$TAG_NAME and vllm/vllm-openai-rocm:$TAG_NAME_COMMIT" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.