Skip to content

fix(sbom): fetch last 10 releases for Dakota history#838

Merged
castrojo merged 1 commit into
projectbluefin:mainfrom
castrojo:fix/dakota-sbom-backfill
May 14, 2026
Merged

fix(sbom): fetch last 10 releases for Dakota history#838
castrojo merged 1 commit into
projectbluefin:mainfrom
castrojo:fix/dakota-sbom-backfill

Conversation

@castrojo
Copy link
Copy Markdown
Contributor

@castrojo castrojo commented May 14, 2026

Problem

processLatestTagStream() only fetched the current :latest image, so the driver-versions page always showed a single Dakota entry.

Dakota stopped using latest.YYYYMMDD date tags in February 2026 and switched to 40-char commit-SHA image tags — one per build. The existing code had no way to discover these.

Fix

processLatestTagStream() now:

  1. Fetches all GHCR tags for the image
  2. Filters to the 10 most recently pushed commit-SHA tags (40-char hex)
  3. Processes :latest plus those 10 tags — each via getImageCreatedDate()latest-YYYYMMDD cache key → SBOM fetch
  4. Seeds from existing cache so history accumulates across nightly runs

Result: up to 10 historical Dakota releases in the driver-versions page, refreshed nightly.

Verification

npm run build:ci  →  [SUCCESS] Generated static files

Summary by CodeRabbit

  • Chores
    • Improved Docker image reference processing and caching logic to handle multiple recent versions with date-based cache optimization.

Review Change Stack

processLatestTagStream() now processes :latest plus the 10 most
recently pushed commit-SHA image tags. Each tag is a distinct
build — getImageCreatedDate() extracts the date and it becomes
a separate cache entry (latest-YYYYMMDD).

This gives the driver-versions page a rolling 10-release history
for Dakotaraptor instead of a single current-state entry.

Assisted-by: Claude Sonnet 4.6 via pi
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

The change extends Dakota SBOM caching from a single :latest image tag to multiple image refs: the function now fetches GHCR tags, constructs a list of the latest :latest tag plus the most recent 10 commit-SHA tags, derives a latest-YYYYMMDD cache key from each image's creation date, and iteratively stores release entries with per-image cache keys and tag values instead of using a constant "latest".

Changes

Dakota multi-image SBOM cache keying

Layer / File(s) Summary
Cache seeding, tag enumeration, and per-image cache-hit logic
scripts/fetch-github-sbom.js
processLatestTagStream() now seeds releases from the existing cache, enumerates GHCR tags to build an imageRefs list (:latest plus last 10 commit-SHA tags), derives cacheKey from each image's creation-date annotation, and loops over these refs applying cache-hit logic per key. Release entries are then stored with tag: imageRef instead of a constant "latest", and the loop is closed after each entry assignment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • projectbluefin/documentation#815: Updates enrichDakotaNvidiaFromSbom() to consume the new per-image latest-* cache keys and tag values that this PR produces in processLatestTagStream().
  • projectbluefin/documentation#807: Installs oras in the CI workflow, which enables scripts/fetch-github-sbom.js to read the image creation-date annotations needed to derive the new latest-YYYYMMDD cache keys.
  • projectbluefin/documentation#795: Also modifies processLatestTagStream() to switch to :latest-based attestation and caching flow for Dakota, making related changes to the same function's cache and tag handling logic.

Poem

🐰 A cache of images, now keyed by date,
Multiple refs await their fate,
From GHCR tags, Dakota's best are drawn,
Per-image keys ensure no dawn is lost,
One loop to bind them all—imageRef at last!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: updating the SBOM fetch logic to retrieve multiple historical Dakota releases instead of just the latest one.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/fetch-github-sbom.js`:
- Around line 357-359: The current cache key uses only dateStr (from
getImageCreatedDate) producing `latest-YYYYMMDD` which lets multiple same-day
image refs overwrite each other; change the cacheKey generation to include a
unique identifier (e.g., append a time component from dateStr, the imageRef, or
the commit SHA) so each imageRef produces a distinct key (update the cacheKey
calculation where dateStr and cacheKey are computed and the other occurrence
around lines 417-419 that assigns releases[cacheKey]). Ensure the key remains
deterministic and safe for map lookups but is unique per image (use a short
imageRef/sha or ISO datetime).
- Around line 341-355: The new multi-ref imageRefs logic is never exercised
because processLatestTagStream() is only invoked when spec.usesLatestTag is true
but Dakota streams use streamPrefix: "latest" instead; update the condition that
triggers processLatestTagStream to also accept specs with streamPrefix ===
"latest" (or set spec.usesLatestTag = true when streamPrefix === 'latest') so
the code that builds imageRefs (the commitTags/latest GHCR refs) runs for those
streams as well; ensure you modify the call site that checks spec.usesLatestTag
(and any related branching) rather than the imageRefs construction itself so
existing behavior is preserved for other streams.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e5ba1a8-e63d-413c-be73-723f93955987

📥 Commits

Reviewing files that changed from the base of the PR and between f501843 and 30c84dc.

📒 Files selected for processing (1)
  • scripts/fetch-github-sbom.js

Comment on lines +341 to +355
// Seed from existing cache — accumulates history across nightly runs.
const existingReleases = existing?.streams?.[spec.id]?.releases || {};
const releases = { ...existingReleases };

// Build the list of image refs to process: :latest plus the 10 most recent
// commit-SHA tags (each is a distinct tagged build pushed to GHCR).
const allTags = await fetchGhcrTags(spec.org, spec.package);
const commitTags = allTags
.filter((t) => /^[0-9a-f]{40}$/.test(t))
.slice(-10); // last 10 = most recently pushed
const imageRefs = [
`ghcr.io/${spec.org}/${spec.package}:latest`,
...commitTags.map((t) => `ghcr.io/${spec.org}/${spec.package}:${t}`),
];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Latest-stream backfill logic is not reachable with current stream specs.

processLatestTagStream() is only called when spec.usesLatestTag is truthy (Line 454), but Dakota specs in this file are configured via streamPrefix: "latest" and don’t set usesLatestTag. That makes the new multi-ref logic effectively dead for Dakota.

💡 Proposed fix
@@
   {
     id: "dakota-latest",
@@
     streamPrefix: "latest",
+    usesLatestTag: true,
     keyRepo: "projectbluefin/dakota",
     keyless: true,
   },
   {
     id: "dakota-nvidia-latest",
@@
     streamPrefix: "latest",
+    usesLatestTag: true,
     keyRepo: "projectbluefin/dakota",
     keyless: true,
   },

Also applies to: 454-456

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/fetch-github-sbom.js` around lines 341 - 355, The new multi-ref
imageRefs logic is never exercised because processLatestTagStream() is only
invoked when spec.usesLatestTag is true but Dakota streams use streamPrefix:
"latest" instead; update the condition that triggers processLatestTagStream to
also accept specs with streamPrefix === "latest" (or set spec.usesLatestTag =
true when streamPrefix === 'latest') so the code that builds imageRefs (the
commitTags/latest GHCR refs) runs for those streams as well; ensure you modify
the call site that checks spec.usesLatestTag (and any related branching) rather
than the imageRefs construction itself so existing behavior is preserved for
other streams.

Comment on lines +357 to +359
const dateStr = await getImageCreatedDate(imageRef);
const cacheKey = dateStr ? `latest-${dateStr}` : null;
if (!cacheKey) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Date-only cache keys can overwrite multiple same-day Dakota builds.

Using latest-YYYYMMDD as the sole key means different refs pushed on the same day collapse into one entry (releases[cacheKey] = ...), so you can lose history even when processing 10 commit-SHA tags.

💡 Proposed fix
-    const dateStr = await getImageCreatedDate(imageRef);
-    const cacheKey = dateStr ? `latest-${dateStr}` : null;
-    if (!cacheKey) continue;
+    const dateStr = await getImageCreatedDate(imageRef);
+    const refTag = imageRef.split(":").pop() || "unknown";
+    let cacheKey = dateStr
+      ? `latest-${dateStr}`
+      : `latest-unknown-${refTag.slice(0, 12)}`;
+    if (releases[cacheKey] && releases[cacheKey].imageRef !== imageRef) {
+      cacheKey = `${cacheKey}-${refTag.slice(0, 12)}`;
+    }

Also applies to: 417-419

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/fetch-github-sbom.js` around lines 357 - 359, The current cache key
uses only dateStr (from getImageCreatedDate) producing `latest-YYYYMMDD` which
lets multiple same-day image refs overwrite each other; change the cacheKey
generation to include a unique identifier (e.g., append a time component from
dateStr, the imageRef, or the commit SHA) so each imageRef produces a distinct
key (update the cacheKey calculation where dateStr and cacheKey are computed and
the other occurrence around lines 417-419 that assigns releases[cacheKey]).
Ensure the key remains deterministic and safe for map lookups but is unique per
image (use a short imageRef/sha or ISO datetime).

@castrojo castrojo merged commit 644b405 into projectbluefin:main May 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant