Skip to content

fix: add rootless DinD daemon.json path fallback in preflight#2045

Merged
ericksoa merged 24 commits intomainfrom
fix/docker-socket-proxy
Apr 21, 2026
Merged

fix: add rootless DinD daemon.json path fallback in preflight#2045
ericksoa merged 24 commits intomainfrom
fix/docker-socket-proxy

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 18, 2026

Summary

After removing the experimental k8s manifests in #2107, this PR retains the one non-k8s change from the original docker-socket-proxy work: readDockerDefaultCgroupnsMode() in preflight.ts now checks both the standard /etc/docker/daemon.json and the rootless DinD path /home/rootless/.config/docker/daemon.json when reading the default cgroupns mode.

Previously only /etc/docker/daemon.json was checked, so rootless DinD setups would always report "unknown" for cgroupns mode.

Changes

  • src/lib/preflight.ts: readDockerDefaultCgroupnsMode() iterates over both daemon.json paths, returning the first valid mode found.

Test plan

  • npm run typecheck:cli passes
  • npm test passes

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection of Docker configuration settings by checking multiple standard configuration locations, making the system more resilient to different Docker installation types and setups. The tool now gracefully falls back through multiple paths instead of failing immediately if a configuration location is unavailable.

ericksoa and others added 2 commits April 17, 2026 17:13
Squashed PR #2044 as base for socket proxy work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Workspace no longer has direct access to the Docker socket. A
tecnativa/docker-socket-proxy sidecar filters API calls — only
container, image, network, volume, info, and version endpoints are
allowed. Exec and build are explicitly denied.

The proxy runs as nobody (UID 65534) with GID 1000 (matching fsGroup)
so it can connect to the rootless DinD socket without running as root.
The socket volume is mounted at a dedicated /docker-socket path with
readOnly to avoid conflicts with HAProxy runtime files.

Also extends the Docker readiness timeout from 60s to 90s to account
for rootless DinD + proxy startup, updates the README architecture
diagram, and adds security test coverage for the proxy hardening and
socket isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d31f79e0-1d1c-4594-bfcc-ad5595e3704f

📥 Commits

Reviewing files that changed from the base of the PR and between 61467e7 and cd98cac.

📒 Files selected for processing (1)
  • src/lib/preflight.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/preflight.ts

📝 Walkthrough

Walkthrough

The readDockerDefaultCgroupnsMode() function now checks multiple Docker daemon configuration paths sequentially. Instead of reading only /etc/docker/daemon.json, it attempts to read both /etc/docker/daemon.json and /home/rootless/.config/docker/daemon.json, returning the first valid default-cgroupns-mode value found, or "unknown" if neither path yields a match.

Changes

Cohort / File(s) Summary
Docker Configuration Fallback Logic
src/lib/preflight.ts
Modified readDockerDefaultCgroupnsMode() to iterate over two candidate daemon.json paths with fallback behavior; read/parse failures on the first path now allow checking the second path instead of immediately returning "unknown".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Two paths the Docker daemon roam,
First standard, then rootless home,
We search with care, step by step,
Till cgroupns mode we safely keep,
No more one answer, now we explore,
Resilience is what we're hopping for! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: add rootless DinD daemon.json path fallback in preflight' directly and clearly describes the main change: adding support for a rootless Docker daemon.json path as a fallback in the preflight check function.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docker-socket-proxy

Comment @coderabbitai help to get the list of available commands and usage tips.

ericksoa and others added 2 commits April 17, 2026 17:16
The CI legacy-path guard blocks edits to test/*.test.js files.
Move the new proxy assertions to a .ts file and restore the .js
to its original content so the cumulative diff shows the .js as
unchanged and the .ts as newly added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
test/security-configuration-hardening.test.js (1)

51-58: Cover the proxy UID/GID contract too.

The proxy's ability to open the rootless socket depends on runAsUser: 65534 and runAsGroup: 1000. If either regresses, this deployment can break while the hardening test still stays green.

Suggested assertions
     expect(proxySection).toMatch(/allowPrivilegeEscalation:\s*false/);
     expect(proxySection).toMatch(/capabilities:\s*[\r\n]+\s*drop:\s*[\r\n]+\s*-\s*ALL/);
     expect(proxySection).toMatch(/seccompProfile:\s*[\r\n]+\s*type:\s*RuntimeDefault/);
     expect(proxySection).toMatch(/runAsNonRoot:\s*true/);
+    expect(proxySection).toMatch(/runAsUser:\s*65534/);
+    expect(proxySection).toMatch(/runAsGroup:\s*1000/);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-configuration-hardening.test.js` around lines 51 - 58, Add
assertions to the proxy-section hardening test to verify the UID/GID contract by
checking that proxySection contains the expected numeric user and group;
specifically, update the test that references proxySection in
test/security-configuration-hardening.test.js to include two additional
expectations that match runAsUser: 65534 and runAsGroup: 1000 (use regex
patterns similar to the existing assertions to tolerate whitespace).
k8s/nemoclaw-k8s.yaml (1)

213-223: Harden init-docker-config the same way as the rest of the pod.

This init container only writes one file into a mounted volume, so leaving it with the default writable root filesystem and default capabilities is looser than it needs to be.

Low-risk hardening
     - name: init-docker-config
       image: busybox
+      securityContext:
+        allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: true
+        capabilities:
+          drop:
+            - ALL
       command:
         - sh
         - -c
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/nemoclaw-k8s.yaml` around lines 213 - 223, The init container
init-docker-config should be hardened: add a securityContext to the container
(in the init container spec for name: init-docker-config) that enforces
runAsNonRoot: true with a non-root runAsUser (e.g. 1000), set
allowPrivilegeEscalation: false, drop all capabilities (capabilities: { drop:
["ALL"] }), enable a seccompProfile (type: RuntimeDefault) and set
readOnlyRootFilesystem: true while keeping the /config/docker volumeMount
writable so the single file write still works; this tightens privileges without
changing the current behavior of writing daemon.json to the mounted volume.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@k8s/nemoclaw-k8s.yaml`:
- Around line 67-80: The docker-proxy sidecar (container name "docker-proxy",
image tecnativa/docker-socket-proxy:0.6.0) exposes port 2375 on the pod network,
making the filtered Docker API reachable from other pods; either remove the
ports: - containerPort: 2375 stanza to avoid advertising the TCP port or,
preferably, add a NetworkPolicy that only allows ingress to pods with this
sidecar from the specific "workspace" pod/label/namespace (deny all other pod
ingress), referencing the pod label used by the deployment; update the manifest
to include that NetworkPolicy (or tighten the container's listen address to
localhost via docker-socket-proxy config if available) so only the intended pod
can reach port 2375.
- Around line 213-220: readDockerDefaultCgroupnsMode() currently only reads
/etc/docker/daemon.json and misses rootless Docker installs; update that
function to attempt the rootless path
(/home/rootless/.config/docker/daemon.json) in addition to the system path,
falling back gracefully if one is missing, and return the parsed
default-cgroupns-mode when found (ensure you still parse JSON and handle
read/parse errors the same way as the existing /etc/docker handling). Use the
existing function name readDockerDefaultCgroupnsMode and keep the same return
shape/behavior so callers remain unchanged.

In `@k8s/README.md`:
- Around line 214-215: Update the startup timing sentence in k8s/README.md to
match the manifest's retry policy: replace "Usually resolves after 30-60
seconds" with a statement that the workspace may take up to 90 seconds to start
(45 retries × 2s), and keep the note that rootless DinD can take slightly longer
due to user namespace setup so users don't debug prematurely.
- Around line 1-3: Add the required SPDX license header as the very first line
of the file (e.g., an SPDX-License-Identifier line) and remove the emoji from
the experimental callout; locate the "NemoClaw on Kubernetes" title and the
following blockquote starting with "⚠️ Experimental" and replace that blockquote
text with the same content but without the emoji and with the SPDX header
inserted above the title.
- Around line 167-178: The architecture ASCII diagram contains "openshell CLI"
with incorrect casing; update that token to the documented product casing
"OpenShell CLI" wherever it appears in the diagram (look for the string
"openshell CLI" inside the k8s/README.md diagram block) to comply with the
naming guideline for OpenShell/OpenClaw/NemoClaw.

In `@test/security-configuration-hardening.test.js`:
- Around line 72-74: The test is too loose by only checking for any tcp:// host
in workspaceSection; update the assertion so it requires the exact local
endpoint tcp://localhost:2375 (reference the workspaceSection variable and the
expect(...).toMatch call) and keep the existing negative unix: check;
specifically replace the loose regex that matches tcp://... with one that
matches value:\s*tcp:\/\/localhost:2375 so the test fails if DOCKER_HOST is set
to any other TCP target.

---

Nitpick comments:
In `@k8s/nemoclaw-k8s.yaml`:
- Around line 213-223: The init container init-docker-config should be hardened:
add a securityContext to the container (in the init container spec for name:
init-docker-config) that enforces runAsNonRoot: true with a non-root runAsUser
(e.g. 1000), set allowPrivilegeEscalation: false, drop all capabilities
(capabilities: { drop: ["ALL"] }), enable a seccompProfile (type:
RuntimeDefault) and set readOnlyRootFilesystem: true while keeping the
/config/docker volumeMount writable so the single file write still works; this
tightens privileges without changing the current behavior of writing daemon.json
to the mounted volume.

In `@test/security-configuration-hardening.test.js`:
- Around line 51-58: Add assertions to the proxy-section hardening test to
verify the UID/GID contract by checking that proxySection contains the expected
numeric user and group; specifically, update the test that references
proxySection in test/security-configuration-hardening.test.js to include two
additional expectations that match runAsUser: 65534 and runAsGroup: 1000 (use
regex patterns similar to the existing assertions to tolerate whitespace).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c6c05082-8946-4683-8d64-93feb32feb65

📥 Commits

Reviewing files that changed from the base of the PR and between 56af860 and f4f912a.

📒 Files selected for processing (4)
  • k8s/README.md
  • k8s/nemoclaw-k8s.yaml
  • src/lib/onboard.ts
  • test/security-configuration-hardening.test.js

Comment thread k8s/nemoclaw-k8s.yaml Outdated
Comment thread k8s/nemoclaw-k8s.yaml Outdated
Comment thread k8s/README.md Outdated
Comment thread k8s/README.md Outdated
Comment thread k8s/README.md Outdated
Comment thread test/security-configuration-hardening.test.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/security-configuration-hardening.test.ts (1)

14-16: Harden workspace section extraction regex to avoid nested-match truncation.

Line 15 currently stops on any - name: line, including nested entries (for example env vars), which can truncate workspaceSection unexpectedly as the manifest evolves.

♻️ Proposed fix
-      /- name: workspace[\s\S]*?(?=\n\s*-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/,
+      /- name: workspace[\s\S]*?(?=\n\s{4}-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-configuration-hardening.test.ts` around lines 14 - 16, The
workspace extraction regex can prematurely stop on nested "- name:" lines;
update the pattern used for workspaceMatch to capture the leading indentation
and use that indentation in the lookahead so only same-level siblings terminate
the match (e.g. change the regex to capture an indentation group like /(^\s*)-
name: workspace[\s\S]*?(?=\n\1-\s*name: |\n\1initContainers:|\n\1volumes:|$)/m),
then rebuild workspaceSection from workspaceMatch as before (referencing the
workspaceMatch and workspaceSection variables).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/security-configuration-hardening.test.ts`:
- Around line 14-16: The workspace extraction regex can prematurely stop on
nested "- name:" lines; update the pattern used for workspaceMatch to capture
the leading indentation and use that indentation in the lookahead so only
same-level siblings terminate the match (e.g. change the regex to capture an
indentation group like /(^\s*)- name: workspace[\s\S]*?(?=\n\1-\s*name:
|\n\1initContainers:|\n\1volumes:|$)/m), then rebuild workspaceSection from
workspaceMatch as before (referencing the workspaceMatch and workspaceSection
variables).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d11e9a50-ff7f-4897-96f0-8adfcb32318a

📥 Commits

Reviewing files that changed from the base of the PR and between f4f912a and b2b5f04.

📒 Files selected for processing (1)
  • test/security-configuration-hardening.test.ts

ericksoa and others added 3 commits April 17, 2026 17:33
The TypeScript strict null checks flagged `workspaceMatch` and
`proxyMatch` as possibly null despite the preceding `toBeNull()` guard.
Add `!` post-fix assertions since Vitest's `expect().not.toBeNull()`
narrows at runtime but not at the type level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add readOnlyRootFilesystem to docker-proxy and init-docker-config containers
- Remove containerPort declaration from docker-proxy to avoid advertising 2375
  on the pod IP
- Support rootless Docker daemon.json path in readDockerDefaultCgroupnsMode()
- Add SPDX header and remove emoji from k8s README
- Fix OpenShell CLI casing in architecture diagram
- Update startup timing to match 90s retry policy (45 retries x 2s)
- Tighten test assertion to require exact tcp://localhost:2375 endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
k8s/nemoclaw-k8s.yaml (1)

65-122: ⚠️ Potential issue | 🟠 Major

docker-proxy is still reachable from other pods unless ingress is explicitly restricted.

Removing containerPort hides advertisement, but it does not block traffic to Pod IP:2375. Please add a NetworkPolicy (deny-all ingress, or allow only explicitly needed peers) for app: nemoclaw.

🔐 Suggested hardening (deny all ingress to this pod)
+---
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: nemoclaw-deny-ingress
+  namespace: nemoclaw
+spec:
+  podSelector:
+    matchLabels:
+      app: nemoclaw
+  policyTypes:
+    - Ingress
+  ingress: []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/nemoclaw-k8s.yaml` around lines 65 - 122, The docker-proxy sidecar
(container name "docker-proxy") remains reachable on PodIP:2375 from other pods;
add a Kubernetes NetworkPolicy that selects pods with label "app: nemoclaw" and
denies all ingress by default (or alternatively, explicitly allow only required
peers/namespaces) to prevent pod-to-pod access to the proxy; implement this by
creating a NetworkPolicy resource that either has an empty ingress list (deny
all) or a minimal ingress rule permitting known peers, and ensure its
podSelector matches app: nemoclaw so the docker-proxy pod is protected.
🧹 Nitpick comments (1)
test/security-configuration-hardening.test.ts (1)

73-75: Tighten DOCKER_HOST assertion to enforce exact endpoint value.

Current regex matches substrings; anchoring to the full value line avoids accidental passes.

🎯 Suggested fix
-    expect(workspaceSection).toMatch(
-      /DOCKER_HOST[\s\S]*?value:\s*tcp:\/\/localhost:2375/,
-    );
+    expect(workspaceSection).toMatch(
+      /DOCKER_HOST[\s\S]*?value:\s*["']?tcp:\/\/localhost:2375["']?(?:\r?\n|$)/,
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-configuration-hardening.test.ts` around lines 73 - 75, The
current expectation in security-configuration-hardening.test.ts uses a loose
regex for workspaceSection and DOCKER_HOST that can match substrings; tighten it
so the value line is matched exactly by updating the regex in the
expect(workspaceSection).toMatch(...) call to anchor the value line (use
multiline mode) e.g. ensure the pattern includes
^\s*value:\s*tcp:\/\/localhost:2375\s*$ with the m flag and is applied within
the DOCKER_HOST block (reference workspaceSection and the DOCKER_HOST value
line) so only an exact endpoint value passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/security-configuration-hardening.test.ts`:
- Around line 15-16: The workspace-section regex is too permissive: update the
lookahead in the regex literal currently matching "/- name:
workspace[\s\S]*?(?=\n\s*-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/" to
use the container-level indentation boundary (\s{4}) so nested "- name:" entries
don't prematurely stop the match; e.g., change the lookahead to
"(?=\n\s{4}-\s*name: |\n\s{4}initContainers:|\n\s{4}volumes:|$)". Ensure the
change is applied to the same regex used in the test (the /- name:
workspace[...].../ literal).

---

Duplicate comments:
In `@k8s/nemoclaw-k8s.yaml`:
- Around line 65-122: The docker-proxy sidecar (container name "docker-proxy")
remains reachable on PodIP:2375 from other pods; add a Kubernetes NetworkPolicy
that selects pods with label "app: nemoclaw" and denies all ingress by default
(or alternatively, explicitly allow only required peers/namespaces) to prevent
pod-to-pod access to the proxy; implement this by creating a NetworkPolicy
resource that either has an empty ingress list (deny all) or a minimal ingress
rule permitting known peers, and ensure its podSelector matches app: nemoclaw so
the docker-proxy pod is protected.

---

Nitpick comments:
In `@test/security-configuration-hardening.test.ts`:
- Around line 73-75: The current expectation in
security-configuration-hardening.test.ts uses a loose regex for workspaceSection
and DOCKER_HOST that can match substrings; tighten it so the value line is
matched exactly by updating the regex in the
expect(workspaceSection).toMatch(...) call to anchor the value line (use
multiline mode) e.g. ensure the pattern includes
^\s*value:\s*tcp:\/\/localhost:2375\s*$ with the m flag and is applied within
the DOCKER_HOST block (reference workspaceSection and the DOCKER_HOST value
line) so only an exact endpoint value passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3b315c59-2b85-4560-b240-6ddf00c5cb14

📥 Commits

Reviewing files that changed from the base of the PR and between 681c577 and 61467e7.

📒 Files selected for processing (4)
  • k8s/README.md
  • k8s/nemoclaw-k8s.yaml
  • src/lib/preflight.ts
  • test/security-configuration-hardening.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/preflight.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • k8s/README.md

Comment thread test/security-configuration-hardening.test.ts Outdated
Take main's String() wrapper over PR's .toString() — both fix the
same stdout coercion and main's landed first.
# Conflicts:
#	k8s/nemoclaw-k8s.yaml
…tion

Use \s{4} instead of \s* in the workspace section lookahead to avoid
matching nested `- name:` entries, consistent with the proxy test regex.

Resolves CodeRabbit review feedback.
@ericksoa ericksoa self-assigned this Apr 19, 2026
@wscurran wscurran added security Something isn't secure priority: high Important issue that should be resolved in the next release K8s Use this label to identify Kubernetes deployment issues with NemoClaw. labels Apr 20, 2026
Set BIND_CONFIG=127.0.0.1:2375 so the proxy is unreachable from
other pods in the cluster, closing off the last network path to
the Docker API.
The test asserted against k8s/nemoclaw-k8s.yaml which was removed
in #2107. The remaining change is the rootless DinD daemon.json
path fallback in preflight.ts.
@ericksoa ericksoa changed the title fix(k8s): interpose docker socket proxy between workspace and daemon fix: add rootless DinD daemon.json path fallback in preflight Apr 21, 2026
@ericksoa ericksoa merged commit d12fd60 into main Apr 21, 2026
10 checks passed
@cv cv deleted the fix/docker-socket-proxy branch April 21, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

K8s Use this label to identify Kubernetes deployment issues with NemoClaw. priority: high Important issue that should be resolved in the next release security Something isn't secure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants