Skip to content

Replace shell based linux sandbox setup with go code#107

Draft
dwt wants to merge 18 commits intomainfrom
fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux
Draft

Replace shell based linux sandbox setup with go code#107
dwt wants to merge 18 commits intomainfrom
fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux

Conversation

@dwt
Copy link
Copy Markdown
Collaborator

@dwt dwt commented Apr 4, 2026

This prevents errors, if the shell environment is very restricted. This
mostly happens by accident if multicall binaries (coreutils,
uutils/coreutils or busybox) lead to overblocking because some of the
shared binaries are blocked and therefore block all of them.

Closes #90 - or at least is supposed to. Didn't have time to check that yet.

Also I am not very confident this is the right way to code this yet, which is why I've left the old one in there for now to make it easy to compare and contrast.

Some things we should probably discuss:

I tried to minimize dependencies inside the sandbox, which is why socat is no longer required in the sandbox. This means that its job is now done by goroutines, which I do not understand at all yet. :) Is this actually a good idea? Are there drawbacks to the way the goroutines handle the traffic forwarding?

Code strucuture is... well. it is. :/

This should however suffice to get the conversation going.

dwt added 6 commits April 4, 2026 11:18
This provents errors, if the shell environment is very restricted. This
mostly happens by accident if multicall binaries (coreutils,
uutils/coreutils or busybox) lead to overblocking because some of the
shared binaries are blocked and therefore block all of them.
@dwt
Copy link
Copy Markdown
Collaborator Author

dwt commented Apr 4, 2026

yes, this fixes #90.

@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch from ed949cc to b796bd8 Compare April 4, 2026 15:31
@dwt dwt force-pushed the fix/remove-dependency-on-shell-for-sandbox-initialization-on-linux branch from b796bd8 to 3bc7ea7 Compare April 4, 2026 15:33
@dwt dwt marked this pull request as ready for review April 4, 2026 15:56
@dwt dwt requested a review from jy-tan as a code owner April 4, 2026 15:56
@dwt dwt marked this pull request as draft April 4, 2026 15:56
@dwt
Copy link
Copy Markdown
Collaborator Author

dwt commented Apr 4, 2026

At least this needs the removal of the old code path to be merged.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 11 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/fence/main.go">

<violation number="1" location="cmd/fence/main.go:685">
P2: Reuse the already-resolved exec path instead of calling `exec.LookPath` a second time after Landlock is applied. The first call resolves the path before Landlock restrictions are active; the second call repeats the lookup under Landlock, where directory traversal is more constrained. For non-standard locations (e.g., `/tmp/fence/bin/shell` as mentioned in the comment), this could fail if `LookPath` needs to stat directories not covered by the read rules. Hoisting the resolution out of the Linux-only block avoids both the redundancy and the potential failure.</violation>
</file>

<file name="cmd/fence/linux_bootstrap.go">

<violation number="1" location="cmd/fence/linux_bootstrap.go:167">
P1: Socket paths are not passed to `ApplyLandlockFromConfigWithExec` — `nil` is used instead of the actual socket paths. After Landlock is applied, the bridge goroutines still need to `Dial` those Unix sockets for each new connection. If the sockets aren't under an already-allowed directory (like `/tmp`), Landlock will block those dials and break proxy connectivity.

The `socketPaths` from `startBridgesAndSetEnv` should be threaded through to `applyLandlock` and passed as the third argument.</violation>

<violation number="2" location="cmd/fence/linux_bootstrap.go:356">
P2: When one `io.Copy` direction finishes, the function returns and both connections are closed, potentially discarding in-flight data from the other direction. For a proxy bridge, this can truncate responses. Consider waiting for both directions (`<-done; <-done`) or using `TCPConn.CloseWrite()` to half-close so the other direction can drain.</violation>
</file>

<file name="cmd/fence/linux_bootstrap_test.go">

<violation number="1" location="cmd/fence/linux_bootstrap_test.go:60">
P3: Avoid hardcoded TCP ports in tests; they can be occupied on CI/dev machines and make the tests flaky. Allocate a free port dynamically (e.g., listen on "127.0.0.1:0" to get a port) and pass that port into the bridge and dial calls.</violation>
</file>

<file name="internal/sandbox/linux.go">

<violation number="1" location="internal/sandbox/linux.go:566">
P1: Both `shellPath` and `fenceExePath` are passed to `--ro-bind` without resolving symlinks via `resolvePathForMount()`. On usr-merged distros where `/bin` → `/usr/bin`, bwrap will fail because the destination path contains symlink components. The shell-based bootstrap avoids this by resolving paths and using fixed destinations under `/tmp/fence/bin/`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread cmd/fence/linux_bootstrap.go Outdated
Comment thread internal/sandbox/linux.go Outdated
Comment thread cmd/fence/main.go Outdated
Comment thread cmd/fence/linux_bootstrap.go Outdated
Comment thread cmd/fence/linux_bootstrap_test.go Outdated
Copy link
Copy Markdown
Contributor

@jy-tan jy-tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this! i think it's directionally correct, just some comments

Comment thread cmd/fence/linux_bootstrap.go
Comment thread cmd/fence/linux_bootstrap.go Outdated
}
}

func startBridgesAndSetEnv(ctx context.Context, opts bootstrapOptions) []string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function leaves out setting NO_PROXY / no_proxy, which is present in original buildLinuxBootstrapScript. this could affect loopback behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look, and I believe NO_PROXY/no_proxy is actually being set in the Go code, but perhaps I am missing something?.

The Go bootstrap sets it here:

fence/cmd/fence/linux_bootstrap.go:155-162

if opts.httpSocket != "" || opts.socksSocket != "" {
	if err := setEnvVars(envGroup{
		keys:  []string{"NO_PROXY", "no_proxy"},
		value: "localhost,127.0.0.1",
	}); err != nil {
		return nil, fatalError(ExitWrapperSetupFailed, "failed to set no_proxy env vars: %v", err)
	}
}

This matches the original shell script's value:

fence/internal/sandbox/linux.go:695-700

export NO_PROXY=localhost,127.0.0.1
export no_proxy=localhost,127.0.0.1

To me this looks like the Go code preserves the same NO_PROXY behavior as the original buildLinuxBootstrapScript.

When I set my agent on this, it also found GenerateProxyEnvVars in utils.go, which generates a more expansive value. As far as I can tell it is currently only used in the Mac Sandbox.

fence/internal/sandbox/utils.go:67-77

noProxy := strings.Join([]string{
	"localhost",
	"127.0.0.1",
	"::1",
	"*.local",
	".local",
	"169.254.0.0/16",
	"10.0.0.0/8",
	"172.16.0.0/12",
	"192.168.0.0/16",
}, ",")

This includes ::1 (IPv6 loopback), .local/*.local, and private network ranges. I'm unsure whether these additional entries are appropriate for the Linux bootstrap context though — inside the sandbox, the proxy bridges only listen on 127.0.0.1, so traffic to private networks would still need to traverse the proxy to reach the host. Exempting 10.0.0.0/8 etc. from the proxy could potentially cause connections to fail silently if a sandboxed process tries to reach a private address that isn't actually routable inside the sandbox. On the other hand, ::1 seems like it could be a reasonable addition if the sandbox has IPv6 loopback available.

Comment thread internal/sandbox/linux.go Outdated
@jy-tan
Copy link
Copy Markdown
Contributor

jy-tan commented Apr 8, 2026

also, tests like that use runUnderSandbox() / runUnderLinuxSandboxDirect() uses a test binary in /tmp afaik, and therefore falls back to the shell bootstrap path.

we should have another true e2e test similar, maybe something like TestLinux_GoBootstrapWrapper_RuntimeExecDeny_DoesNotCrashOnBinAliasPath , similar to TestLinux_LandlockWrapperPreservesRepoLocalFenceBinary (build a binary called fence and is not placed in /tmp), which asserts that the debug log contains the marker Using Go-based linux-bootstrap wrapper.

dwt added 4 commits April 8, 2026 21:34
…ad of calling os.Exit()

This should make it easier to test them.
Now we always mount it in /tmp and execute it from there
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build fails in the nix linux build sandbox

2 participants