Skip to content

Remove _remote.repositories from cache + use offline mode for forked PRs#1391

Merged
gopalldb merged 36 commits intodatabricks:mainfrom
gopalldb:fix/maven-cache-forked-prs
Apr 8, 2026
Merged

Remove _remote.repositories from cache + use offline mode for forked PRs#1391
gopalldb merged 36 commits intodatabricks:mainfrom
gopalldb:fix/maven-cache-forked-prs

Conversation

@gopalldb
Copy link
Copy Markdown
Collaborator

@gopalldb gopalldb commented Apr 8, 2026

Summary

Two fixes for forked PR dependency resolution:

  1. Warmer: Remove _remote.repositories marker files before saving the cache. These files track which repo ID (jfrog-central) each artifact was downloaded from. When forked PRs restore the cache in offline mode, Maven refuses artifacts because they were "not downloaded from central".

  2. Fork composite action: Use Maven offline mode (-o via .mvn/maven.config) + empty settings.xml. With clean cache (no repo ID markers), offline mode resolves everything from ~/.m2/repository with zero network requests.

Also deleted the existing stale cache entry to force a fresh save.

Test plan

NO_CHANGELOG=true

This pull request was AI-assisted by Isaac.

gopalldb and others added 30 commits April 6, 2026 15:50
Reverts databricks#1350. GitHub-hosted runner IPs are blocked by the Databricks
org IP allow list, causing gh CLI API calls to fail and preventing
the required status checks from matching (runner label mismatch).

Restores all 22 workflow files to use databricks-protected-runner-group
with linux-ubuntu-latest and windows-server-latest labels.

Also configures JFrog Artifactory as Maven mirror via OIDC token
exchange, since Databricks runners cannot access public registries
directly (supply chain security policy).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Forked PRs cannot authenticate to JFrog Artifactory via OIDC (GitHub
restricts id-token for fork workflows). This change enables forked PR
CI by pre-caching dependencies from a privileged workflow.

New files:
- .github/actions/setup-maven/action.yml: Reusable composite action
  that detects forked PRs and either authenticates to JFrog (same-repo)
  or restores the dependency cache (fork)
- .github/workflows/warmMavenCache.yml: Privileged workflow that
  resolves all dependencies via JFrog and saves the cache. Triggers on
  pom.xml changes to main, daily schedule, and manual dispatch with
  optional PR number for warming from a fork's pom.xml

Modified workflows to use the composite action:
- prCheck.yml (formatting, unit tests, packaging)
- prIntegrationTests.yml
- coverageReport.yml

Cache key: {os}-maven-deps-{hash(pom.xml)} with prefix restore-keys.
Forked PRs read cache from the default branch per GitHub Actions rules.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
When a PR with dependency changes merges to main, the cache warmer now
runs a cleanup job that deletes maven-deps cache entries from previous
pom.xml versions. This prevents stale dependency caches from occupying
space when concurrent PRs have different dependency versions.

Cache lifecycle:
- Each unique pom.xml hash gets its own cache entry (content-addressable)
- Multiple concurrent PRs coexist in cache with different keys
- On merge to main, stale entries (not matching main's current hash)
  are deleted via gh cache delete
- GitHub also auto-evicts caches not accessed in 7 days

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
GitHub Actions automatically evicts cache entries not accessed in 7
days. This is sufficient for cleaning up stale PR dependency caches.
The explicit cleanup job adds complexity without meaningful benefit.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
…utionTests

- prCheckJDK8.yml: Add fork detection + composite action (triggers on
  pull_request to jdk-8 branch, so forked PRs are affected)
- concurrencyExecutionTests.yml: Use composite action for consistency
  (always is-fork=false since it only triggers on push/dispatch)

Both workflows now use .github/actions/setup-maven instead of inline
JFrog OIDC + cache boilerplate.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
- Drop Windows from cache warmer matrix — Windows runners in
  databricks-protected-runner-group lack bash (command not found)
- Remove runner.os from cache key — Maven JARs/POMs are platform-
  independent, so one cache entry serves both Linux and Windows
- Cache key is now: maven-deps-{hash(pom.xml)}

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
- Drop Windows from cache warmer matrix — Windows runners in
  databricks-protected-runner-group lack bash (command not found)
- Remove runner.os from cache key — Maven JARs/POMs are platform-
  independent, so one cache entry serves both Linux and Windows
- Cache key is now: maven-deps-{hash(pom.xml)}

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Keep -Ddependency-check.skip=true from main in the coverage test command.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The thin/uber jar modules depend on databricks-jdbc-core SNAPSHOT which
must be installed into ~/.m2/repository first. Changed from
mvn compile to mvn install -DskipTests so inter-module SNAPSHOT
artifacts are available during dependency resolution.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
dependency:resolve fails on inter-module SNAPSHOTs (databricks-jdbc-core)
because they don't exist in JFrog — they're local build artifacts.
Since dependency:resolve runs first with set -euo pipefail, the install
command never executes.

Fix: use mvn install alone, which handles both external dependency
resolution from JFrog AND inter-module SNAPSHOT installation.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Keep install-only approach (no dependency:resolve) to avoid
inter-module SNAPSHOT resolution failures.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The cache was missing maven-toolchains-plugin (and potentially other
plugins like spotless, jacoco) because mvn install only resolves
plugins needed for the install lifecycle. Plugins activated by specific
goals or profiles (used in PR unit-test and formatting workflows) were
not cached, causing 401 errors for forked PRs.

Fix: after install, also run dependency:resolve-plugins and trigger
spotless/jacoco plugin downloads to ensure all PR workflow dependencies
are cached.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The cache was missing test-time artifacts (surefire-junit-platform,
jacoco agent, spotless, toolchains plugin, maven-metadata.xml) because
mvn install -DskipTests only resolves compile-time dependencies.

Fix: after install, run the same Maven commands that PR workflows use
(with a no-op test filter) to trigger resolution of all plugins and
providers. This covers:
- surefire-junit-platform (resolved at test execution time)
- maven-toolchains-plugin (resolved when toolchains goal is active)
- spotless plugin + formatters (resolved during spotless:check)
- jacoco agent + report plugins (resolved during jacoco:report)
- plugin group maven-metadata.xml files

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Windows runners in databricks-protected-runner-group don't have git
pre-installed. The "Enable long paths" step ran before actions/checkout
(which installs git), so "git config --system core.longpaths true"
failed with "git: command not found".

Fix: use Windows registry (New-ItemProperty LongPathsEnabled) which
doesn't require git. Also attempt git config as fallback if git is
available.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Databricks protected Windows runners (windows-server-latest) don't have
git pre-installed, causing failures at:
1. "Enable long paths" (git config --system)
2. actions/checkout (requires git)
3. shell: bash steps (requires git bash)

Fix: download and install PortableGit from git-for-windows before any
git-dependent steps. This provides git.exe, bash.exe, and standard
Unix utilities. Also enables long paths via both git config and Windows
registry.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Windows runners in databricks-protected-runner-group lack git.
This is a pre-existing issue to be resolved with the runner team.
Keep Windows in the matrix so failures are visible.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Keep multi-step dependency resolution approach that caches all plugins
(surefire, spotless, jacoco, toolchains).

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
…tion

surefire-junit-platform is resolved lazily at test execution time, not
at plugin initialization. The previous approach (-Dtest=NoSuchTest)
failed before surefire downloaded the provider JAR, so it was never
cached.

Fix: run a real lightweight test (DatabricksParameterMetaDataTest#
testInitialization) to force surefire to fully resolve and download
its JUnit platform provider.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Even with artifacts in the local cache, Maven checks the remote
repository for updates (plugin POM metadata). For forked PRs the
mirror has no credentials, so these checks get 401 errors.

Fix: configure repository and pluginRepository with updatePolicy=never
in an active profile for forked PRs only. This tells Maven to use
cached artifacts without contacting the remote for updates. The
non-fork path (JFrog OIDC) is unchanged.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The updatePolicy=never approach didn't work because Maven plugin
resolution goes through the mirror directly, ignoring repository
update policies.

Fix: for forked PRs, point the mirror to file://{local-repo}. This
forces Maven to resolve everything from ~/.m2/repository (the restored
cache) without any network requests. Same-repo PRs continue using
JFrog OIDC unchanged.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
gopalldb added 5 commits April 8, 2026 16:32
Maven tracks which repository ID an artifact was downloaded from. The
cache warmer uses mirror ID 'jfrog-central', so cached artifacts have
_remote.repositories entries pointing to jfrog-central. Using a
different mirror ID (local-cache) causes Maven to re-verify artifacts,
which fails for some plugins (maven-clean-plugin).

Fix: use the same ID 'jfrog-central' for the file:// mirror so Maven
recognizes cached artifacts without re-verification.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The file:// mirror approach failed because Maven's file:// protocol
doesn't properly serve artifacts from the local repo layout. The
real issue: _remote.repositories marker files in ~/.m2/repository
track which remote repo each artifact came from. Maven re-verifies
artifacts against the remote, which fails without credentials.

Fix: delete _remote.repositories files from the restored cache and
use an empty settings.xml. Without markers, Maven treats all cached
artifacts as locally installed — no remote verification needed.

Verified locally: BUILD SUCCESS with zero network requests.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Empty settings.xml caused Maven to fall back to Maven Central (blocked
on runners). Need both: remove _remote.repositories markers (prevent
re-verification) AND file:// mirror with jfrog-central ID (prevent
fallback to Maven Central).

Verified locally: BUILD SUCCESS with zero network requests.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
file:// mirror doesn't work for Maven plugin resolution (can't find
JARs despite them being in the local repo). Instead, use Maven offline
mode (-o via .mvn/maven.config) combined with _remote.repositories
removal. This prevents all network requests while resolving everything
from the local cache.

Verified locally: BUILD SUCCESS with zero network requests.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
The _remote.repositories files track which repo ID (jfrog-central)
each artifact was downloaded from. When forked PRs restore the cache
and use offline mode, Maven refuses artifacts because they were
"not downloaded from central". Removing markers at restore time was
too late.

Fix: remove _remote.repositories in the warmer BEFORE saving the
cache. The saved cache is clean — forked PRs get artifacts with no
repo ID association, so offline mode works.

Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
@gopalldb gopalldb requested a review from a team as a code owner April 8, 2026 11:36
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
Co-authored-by: Isaac
Signed-off-by: Gopal Lal <gopal.lal@databricks.com>
@gopalldb gopalldb merged commit 7e4a4ad into databricks:main Apr 8, 2026
3 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants