Skip to content

feat: optimize detached HEAD merge base for jj users#470

Closed
121watts wants to merge 3 commits intomainfrom
watts/dep-3974-detached-head-merge-base
Closed

feat: optimize detached HEAD merge base for jj users#470
121watts wants to merge 3 commits intomainfrom
watts/dep-3974-detached-head-merge-base

Conversation

@121watts
Copy link
Copy Markdown
Contributor

@121watts 121watts commented Mar 26, 2026

Summary

For detached-HEAD workflows (standard jj/Jujutsu pattern), findMergeBase now finds the closest pushed ancestor commit instead of falling all the way back to origin/main. This produces smaller, more precise patches when running depot ci run with local changes.

What was happening

When HEAD is detached, findMergeBase skipped the remote tracking branch check entirely (since there's no named branch) and fell back to git merge-base HEAD origin/main. For jj users who always work in detached HEAD but have pushed branches, this meant the patch included the entire branch diff from main -- potentially thousands of lines -- instead of just the unpushed local changes.

What happens now

When HEAD is detached, findClosestRemoteAncestor enumerates all refs under refs/remotes/origin/, checks which ones are ancestors of HEAD, and picks the one with the shortest distance (fewest commits between it and HEAD). This finds the most recent pushed commit that's an ancestor of the current working state.

Falls back to the existing default-branch merge base behavior if no remote tracking ref is an ancestor of HEAD.

Anything else?

This is a pure optimization -- the worst case is identical to current behavior (falls back to origin/main). Edge cases with jj's ref model (bookmarks vs working copy) should be safe since we're only reading standard git refs. Tested with a simulated jj-style workflow: push branch, add local commits, detach HEAD.

Closes DEP-3974

Made with Cursor


Note

Medium Risk
Changes merge-base selection for detached-HEAD repos, which can alter which commits are treated as “unpushed” and therefore what patch content gets uploaded/applied. Fallback behavior remains, but mis-selecting an ancestor ref could produce incorrect or unexpectedly small/large patches in edge cases.

Overview
Updates findMergeBase so that when HEAD is detached it searches refs/remotes/origin/* for the closest ancestor ref (fewest commits behind HEAD) and uses that SHA as the merge base, reducing patch size for jj-style workflows.

Adds findClosestRemoteAncestor to enumerate remote tracking refs, filter to ancestors via git merge-base --is-ancestor, and pick the smallest rev-list --count distance; extends tests with a detached-HEAD case where a pushed feature branch ancestor should be chosen over the default-branch fallback.

Reviewed by Cursor Bugbot for commit 4e171d0. Bugbot is set up for automated code reviews on this repo. Configure here.

When HEAD is detached (standard jj workflow), findMergeBase now walks
remote tracking refs under refs/remotes/origin/ to find the closest
ancestor of HEAD. This produces a minimal patch containing only truly
unpushed local changes, instead of falling back to origin/main and
including the entire branch diff.

The algorithm: enumerate all remote tracking refs, check which are
ancestors of HEAD, and pick the one with the fewest commits between it
and HEAD. Falls back to default branch merge base if no remote ancestor
is found.

Closes DEP-3974

Made-with: Cursor
@linear
Copy link
Copy Markdown

linear bot commented Mar 26, 2026

@121watts 121watts requested a review from lukevmorris March 26, 2026 17:47
Comment on lines +433 to +434
dist := 0
fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchecked Sscanf error could cause incorrect distance calculation

fmt.Sscanf can fail to parse the count output, but the error is ignored. If parsing fails, dist remains 0, which could incorrectly make this ref appear to be at distance 0 from HEAD. This would cause the algorithm to select the wrong ref as the "closest" ancestor.

Fix: Check the Sscanf error and skip refs with parse failures:

n, err := fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
if err != nil || n != 1 {
    continue
}
Suggested change
dist := 0
fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
dist := 0
n, err := fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
if err != nil || n != 1 {
continue
}

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Spawns O(n) git processes for every remote ref
    • Replaced O(n) process spawning with O(distance) commit walking using a SHA map lookup, reducing from 500+ git processes to 2 git commands.
  • ✅ Resolved by another fix: Unchecked Sscanf silently defaults distance to zero
    • The new algorithm eliminates the need for Sscanf entirely by using map-based SHA lookup instead of distance counting.

Create PR

Or push these changes by commenting:

@cursor push 03694e9cc1
Preview (03694e9cc1)
diff --git a/pkg/cmd/ci/run.go b/pkg/cmd/ci/run.go
--- a/pkg/cmd/ci/run.go
+++ b/pkg/cmd/ci/run.go
@@ -402,49 +402,48 @@
 		return "", "", err
 	}
 
-	bestRef := ""
-	bestSHA := ""
-	bestDist := -1
-
+	// Build a map of SHA -> ref name for all remote refs
+	refsBySHA := make(map[string]string)
 	for _, line := range strings.Split(strings.TrimSpace(string(refsOut)), "\n") {
+		if line == "" {
+			continue
+		}
 		parts := strings.SplitN(line, " ", 2)
 		if len(parts) != 2 {
 			continue
 		}
 		refSHA, ref := parts[0], parts[1]
-
-		// Skip origin/HEAD (symbolic ref, not a real branch)
 		if ref == "origin/HEAD" {
 			continue
 		}
+		refsBySHA[refSHA] = ref
+	}
 
-		// Check if this ref is an ancestor of HEAD
-		err := exec.Command("git", "-C", workflowDir, "merge-base", "--is-ancestor", refSHA, "HEAD").Run()
-		if err != nil {
-			continue
-		}
+	if len(refsBySHA) == 0 {
+		return "", "", fmt.Errorf("no remote refs found")
+	}
 
-		// Count commits between this ref and HEAD
-		countOut, err := exec.Command("git", "-C", workflowDir, "rev-list", "--count", refSHA+"..HEAD").Output()
-		if err != nil {
+	// Walk commits from HEAD until we find one that matches a remote ref
+	// This is O(distance) rather than O(number-of-refs)
+	logOut, err := exec.Command("git", "-C", workflowDir,
+		"log", "--pretty=%H", "--first-parent", "HEAD").Output()
+	if err != nil {
+		return "", "", err
+	}
+
+	distance := 0
+	for _, line := range strings.Split(strings.TrimSpace(string(logOut)), "\n") {
+		commitSHA := strings.TrimSpace(line)
+		if commitSHA == "" {
 			continue
 		}
-
-		dist := 0
-		fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
-
-		if bestDist < 0 || dist < bestDist {
-			bestDist = dist
-			bestRef = ref
-			bestSHA = refSHA
+		if ref, found := refsBySHA[commitSHA]; found {
+			return ref, commitSHA, nil
 		}
+		distance++
 	}
 
-	if bestRef == "" {
-		return "", "", fmt.Errorf("no remote ancestor found")
-	}
-
-	return bestRef, bestSHA, nil
+	return "", "", fmt.Errorf("no remote ancestor found")
 }
 
 func detectPatch(workflowDir string) *patchInfo {

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 4e171d0. Configure here.

bestRef = ref
bestSHA = refSHA
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawns O(n) git processes for every remote ref

Medium Severity

findClosestRemoteAncestor spawns two git processes (merge-base --is-ancestor and rev-list --count) per remote tracking ref. For repositories with hundreds or thousands of remote branches under refs/remotes/origin/, this results in thousands of process forks, potentially causing multi-second or even multi-minute delays in a CLI tool that users run interactively. A more efficient approach (e.g., walking commits from HEAD and checking against a set of known ref SHAs) would be O(distance) rather than O(number-of-refs).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4e171d0. Configure here.

}

dist := 0
fmt.Sscanf(strings.TrimSpace(string(countOut)), "%d", &dist)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchecked Sscanf silently defaults distance to zero

Low Severity

fmt.Sscanf return value is ignored. If parsing fails for any reason, dist stays at its initialized value of 0, which is the minimum possible distance. This would cause the function to select that ref as the "closest" ancestor, even though the actual distance is unknown and could be much larger, resulting in an incorrect merge base selection.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4e171d0. Configure here.

@121watts 121watts marked this pull request as draft April 13, 2026 15:58
@121watts 121watts closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants