Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
06a6eaa
fix(k8s): replace privileged DinD with rootless DinD
ericksoa Apr 18, 2026
0fc74e8
fix(k8s): interpose docker socket proxy between workspace and daemon
ericksoa Apr 18, 2026
f4f912a
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 18, 2026
b2b5f04
test: migrate security-hardening test to .ts and revert .js edit
ericksoa Apr 18, 2026
6015b32
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 18, 2026
681c577
fix(test): add non-null assertions for regex match results
ericksoa Apr 18, 2026
61467e7
fix(k8s): address CodeRabbit review feedback on docker socket proxy
ericksoa Apr 18, 2026
69487e6
merge: resolve conflict with main in onboard.ts
ericksoa Apr 18, 2026
393939f
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 18, 2026
3f6bd24
fix(test): tighten workspace-section regex to container-level indenta…
ericksoa Apr 18, 2026
d3ff441
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 19, 2026
d9c3c93
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 20, 2026
680a22a
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 20, 2026
c6737a6
fix(k8s): bind docker-socket-proxy to localhost only
ericksoa Apr 20, 2026
0b4d070
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 20, 2026
cf09c01
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 20, 2026
494471a
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 20, 2026
686fa71
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 21, 2026
b3bbf66
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 21, 2026
aeb5961
merge: resolve conflicts with main after k8s/ removal (#2107)
ericksoa Apr 21, 2026
af9891e
fix: remove k8s files already deleted by #2107
ericksoa Apr 21, 2026
cd98cac
fix: remove k8s security test (manifest deleted in #2107)
ericksoa Apr 21, 2026
cf39790
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 21, 2026
42b8ea7
Merge branch 'main' into fix/docker-socket-proxy
ericksoa Apr 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 44 additions & 28 deletions k8s/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# NemoClaw on Kubernetes

> **⚠️ Experimental**: This deployment method is intended for **trying out NemoClaw on Kubernetes**, not for production use. It requires a **privileged pod** running **Docker-in-Docker (DinD)** to create isolated sandbox environments. Operational requirements (storage, runtime, security policies) vary by cluster configuration.
> **⚠️ Experimental**: This deployment method is intended for **trying out NemoClaw on Kubernetes**, not for production use. It uses rootless Docker-in-Docker (DinD) to create isolated sandbox environments. Operational requirements (storage, runtime, security policies) vary by cluster configuration.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

The sample manifest now uses a few safer defaults out of the box:

- uses rootless DinD (`docker:24-dind-rootless`) — no privileged containers
- interposes a Docker socket proxy between workspace and the daemon, blocking exec, build, and other dangerous API endpoints
- workspace has no direct access to the Docker socket
- disables Kubernetes service account token automounting
- disables service-link environment injection
- runs the workspace container with `allowPrivilegeEscalation: false`, `capabilities.drop: [ALL]`, and `RuntimeDefault` seccomp
Expand All @@ -20,7 +23,7 @@ Run [NemoClaw](https://github.com/NVIDIA/NemoClaw) on Kubernetes with GPU infere

- Kubernetes cluster with `kubectl` access
- An OpenAI-compatible inference endpoint (Dynamo vLLM, vLLM, etc.)
- Permissions to create **privileged pods** (required for Docker-in-Docker)
- Kernel 5.11+ on nodes (required for rootless DinD with overlay2)
- Sufficient node resources (~8GB memory, 2 CPUs for DinD container)

### 1. Deploy NemoClaw
Expand Down Expand Up @@ -155,34 +158,37 @@ sandbox@my-assistant:~$ openclaw agent --agent main -m "What is 7 times 8?"
## Architecture

```text
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ NemoClaw Pod │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Docker-in-Docker│ │ Workspace Container │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌───────────┐ │ │ nemoclaw CLI │ │ │
│ │ │ │ k3s │ │◄───│ openshell CLI │ │ │
│ │ │ │ cluster │ │ │ │ │ │
│ │ │ │ │ │ │ socat proxy ───────────────│───│──┼──► Dynamo/vLLM
│ │ │ │ ┌───────┐ │ │ │ localhost:8000 │ │ │
│ │ │ │ │Sandbox│ │ │ │ │ │ │
│ │ │ │ └───────┘ │ │ │ host.openshell.internal │ │ │
│ │ │ └───────────┘ │ │ routes to socat │ │ │
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ NemoClaw Pod │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────┐ ┌─────────────────────┐ │ │
│ │ │ Rootless DinD │ │ Docker │ │ Workspace │ │ │
│ │ │ │ │ Socket │ │ │ │ │
│ │ │ ┌────────────┐ │ │ Proxy │ │ nemoclaw CLI │ │ │
│ │ │ │ k3s │ │◄─│ │◄─│ openshell CLI │ │ │
│ │ │ │ cluster │ │ │ TCP 2375 │ │ │ │ │
│ │ │ │ │ │ │ │ │ socat ─────────────│──┼──► Dynamo/vLLM
│ │ │ │ ┌───────┐ │ │ │ Filters: │ │ localhost:8000 │ │ │
│ │ │ │ │Sandbox│ │ │ │ EXEC=0 │ │ │ │ │
│ │ │ │ └───────┘ │ │ │ BUILD=0 │ │ host.openshell │ │ │
│ │ │ └────────────┘ │ │ │ │ .internal → socat │ │ │
│ │ └──────────────────┘ └──────────┘ └─────────────────────┘ │ │
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
│ │ ▲ unix socket ▲ tcp://localhost:2375 │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
```

**How it works:**

1. NemoClaw runs in a privileged pod with Docker-in-Docker
2. OpenShell creates a nested k3s cluster for sandbox isolation
3. A socat proxy bridges K8s DNS to the nested environment
4. Inside the sandbox, `host.openshell.internal:8000` routes to the inference endpoint
1. Rootless DinD runs without privileges — host escape via `nsenter` is blocked
2. A Docker socket proxy filters API access: workspace cannot exec into inner containers or run builds
3. Workspace talks to the proxy over TCP; it has no direct socket access
4. OpenShell creates a nested k3s cluster for sandbox isolation
5. A socat proxy bridges K8s DNS to the nested environment
6. Inside the sandbox, `host.openshell.internal:8000` routes to the inference endpoint

---

Expand All @@ -196,16 +202,26 @@ kubectl describe pod nemoclaw -n nemoclaw

Common issues:

- Missing privileged security context
- Insufficient memory (needs ~8GB for DinD)
- Kernel too old for rootless DinD (need 5.11+ for overlay2)

### Docker daemon not starting

```bash
kubectl logs nemoclaw -n nemoclaw -c dind
```

Usually resolves after 30-60 seconds.
Usually resolves after 30-60 seconds. Rootless DinD may take slightly longer than privileged DinD due to user namespace setup.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
### Docker socket proxy issues

```bash
kubectl logs nemoclaw -n nemoclaw -c docker-proxy
```

If workspace reports "Docker not ready" but dind logs look healthy, the proxy may be failing. Check for permission errors (socket access) or port conflicts.

If the NemoClaw installer fails with HTTP 403 errors from Docker, the proxy is blocking an API endpoint the installer needs. Check the proxy logs to see which endpoint was denied, then enable it by setting the corresponding env var to `"1"` in the docker-proxy container.

### Inference not working

Expand Down
120 changes: 104 additions & 16 deletions k8s/nemoclaw-k8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,118 @@ metadata:
labels:
app: nemoclaw
spec:
securityContext:
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
automountServiceAccountToken: false
enableServiceLinks: false
containers:
# Docker daemon (DinD)
- name: dind
image: docker:24-dind
image: docker:24-dind-rootless
securityContext:
privileged: true
privileged: false
runAsUser: 1000
runAsNonRoot: true
allowPrivilegeEscalation: true # required for newuidmap/newgidmap
seccompProfile:
type: Unconfined # required for user namespace setup by rootlesskit
capabilities:
drop:
- ALL
add:
- SETUID # newuidmap
- SETGID # newgidmap
- SETFCAP # file capabilities during image builds
- NET_ADMIN # k3s/flannel CNI network setup
- NET_RAW # k3s/flannel VXLAN and kube-proxy
env:
- name: DOCKER_TLS_CERTDIR
value: ""
command: ["dockerd", "--host=unix:///var/run/docker.sock"]
# Rootlesskit: use the pod network namespace directly instead of
# creating an isolated slirp4netns namespace. Without this, k3s
# and the socat proxy live in different network namespaces and
# inference routing from the sandbox to host.openshell.internal
# breaks.
- name: DOCKERD_ROOTLESS_ROOTLESSKIT_NET
value: "host"
# Rootlesskit: propagate mount events from the pod into the
# rootless mount namespace so k3s can bind-mount volumes into
# sandbox containers.
- name: DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS
value: "--propagation=rslave"
volumeMounts:
- name: docker-storage
mountPath: /var/lib/docker
mountPath: /home/rootless/.local/share/docker
- name: docker-socket
mountPath: /var/run
mountPath: /run/user/1000
- name: docker-config
mountPath: /etc/docker
mountPath: /home/rootless/.config/docker
resources:
requests:
memory: "8Gi"
cpu: "2"

# Docker socket proxy — filters API access from workspace to daemon.
# Only the proxy has the raw socket; workspace talks TCP to the proxy.
- name: docker-proxy
image: tecnativa/docker-socket-proxy:0.6.0
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 65534 # nobody
runAsGroup: 1000 # matches fsGroup — grants access to rootless socket
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
ports:
- containerPort: 2375
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
env:
- name: DOCKER_SOCKET
value: /docker-socket/docker.sock
# Deny everything by default, then allow only what NemoClaw needs
- name: LOG
value: "1" # log all requests for audit
- name: POST
value: "1" # allow POST (needed for create/start)
- name: CONTAINERS
value: "1" # /containers/* — create, start, stop, inspect
- name: IMAGES
value: "1" # /images/* — pull images
- name: NETWORKS
value: "1" # /networks/* — container networking
- name: VOLUMES
value: "1" # /volumes/* — data volumes
- name: INFO
value: "1" # /info — daemon health check
- name: VERSION
value: "1" # /version — docker version
# Everything else denied by default. If the installer needs BUILD
# or EXEC, set the corresponding var to "1" and document why.
- name: EXEC
value: "0" # explicitly deny exec into inner containers
- name: BUILD
value: "0" # deny build (installer pulls pre-built images)
volumeMounts:
- name: docker-socket
mountPath: /docker-socket
readOnly: true
livenessProbe:
tcpSocket:
port: 2375
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "250m"

# Workspace - runs official NemoClaw installer
- name: workspace
image: node:22
Expand Down Expand Up @@ -63,11 +151,11 @@ spec:

# Wait for Docker
echo "[3/4] Waiting for Docker daemon..."
for i in $(seq 1 30); do
for i in $(seq 1 45); do
if docker info >/dev/null 2>&1; then break; fi
sleep 2
done
docker info >/dev/null 2>&1 || { echo "Docker not ready"; exit 1; }
docker info >/dev/null 2>&1 || { echo "Docker not ready (daemon or proxy may have failed to start)"; exit 1; }
echo "Docker ready"

# Default to a dummy compatible API key for unauthenticated endpoints
Expand All @@ -90,7 +178,7 @@ spec:
exec sleep infinity
env:
- name: DOCKER_HOST
value: unix:///var/run/docker.sock
value: tcp://localhost:2375
# Dynamo endpoint (raw host:port for socat) - UPDATE THIS FOR YOUR CLUSTER
- name: DYNAMO_HOST
value: "vllm-disagg-frontend.dynamo.svc.cluster.local:8000"
Expand All @@ -115,11 +203,6 @@ spec:
value: "suggested"
- name: NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE
value: "1"
volumeMounts:
- name: docker-socket
mountPath: /var/run
- name: docker-config
mountPath: /etc/docker
resources:
requests:
memory: "4Gi"
Expand All @@ -129,10 +212,15 @@ spec:
# Configure Docker daemon for cgroup v2
- name: init-docker-config
image: busybox
command: ["sh", "-c", "echo '{\"default-cgroupns-mode\":\"host\"}' > /etc/docker/daemon.json"]
command:
- sh
- -c
- |
mkdir -p /config/docker
echo '{"default-cgroupns-mode":"host"}' > /config/docker/daemon.json
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
volumeMounts:
- name: docker-config
mountPath: /etc/docker
mountPath: /config/docker

volumes:
- name: docker-storage
Expand Down
2 changes: 1 addition & 1 deletion src/lib/onboard.ts
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ function verifyGatewayContainerRunning() {
`docker inspect --type container --format '{{.State.Running}}' ${containerName}`,
{ ignoreError: true, suppressOutput: true },
);
if (result.status === 0 && (result.stdout || "").trim() === "true") {
if (result.status === 0 && (result.stdout || "").toString().trim() === "true") {
return "running";
}
// Container exists but is stopped (exit 0, Running !== "true")
Expand Down
76 changes: 76 additions & 0 deletions test/security-configuration-hardening.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

import { describe, it, expect } from "vitest";
import fs from "node:fs";
import path from "node:path";

const ROOT = path.join(import.meta.dirname, "..");
const K8S_MANIFEST = path.join(ROOT, "k8s", "nemoclaw-k8s.yaml");

describe("security configuration hardening", () => {
it("hardens the Kubernetes sample manifest with safer defaults", () => {
const manifest = fs.readFileSync(K8S_MANIFEST, "utf8");
const workspaceMatch = manifest.match(
/- name: workspace[\s\S]*?(?=\n\s*-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/,
);
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
expect(workspaceMatch).not.toBeNull();
const workspaceSection = workspaceMatch[0];
expect(manifest).toMatch(/automountServiceAccountToken:\s*false/);
expect(manifest).toMatch(/enableServiceLinks:\s*false/);
expect(workspaceSection).toMatch(/allowPrivilegeEscalation:\s*false/);
expect(workspaceSection).toMatch(/capabilities:\s*[\r\n]+\s*drop:\s*[\r\n]+\s*-\s*ALL/);
expect(workspaceSection).toMatch(/seccompProfile:\s*[\r\n]+\s*type:\s*RuntimeDefault/);
expect(manifest).toMatch(/- name: NEMOCLAW_POLICY_MODE[\s\S]*value:\s*"suggested"/);
expect(manifest).toContain('export COMPATIBLE_API_KEY="${COMPATIBLE_API_KEY:-dummy}"');
const compatibleApiKeySection = manifest.match(
/- name: COMPATIBLE_API_KEY[\s\S]*?(?=\n\s*-\s*name: |\n\s*volumeMounts:|\n\s*command:|$)/,
)?.[0];
expect(compatibleApiKeySection).toBeTruthy();
expect(compatibleApiKeySection).toMatch(
/secretKeyRef:[\s\S]*name:\s*nemoclaw-compatible-api-key/,
);
expect(compatibleApiKeySection).toMatch(/optional:\s*true/);
expect(manifest).toContain("curl --proto '=https' --tlsv1.2 --fail --show-error --silent");
expect(manifest).toContain("--output /tmp/nemoclaw-install.sh");
expect(manifest).toContain("chmod 700 /tmp/nemoclaw-install.sh");
expect(manifest).toContain("bash /tmp/nemoclaw-install.sh");
expect(manifest).not.toMatch(/curl\b[^\n|]*\|\s*(?:ba|z|k)?sh\b/i);
});

it("interposes a docker socket proxy between workspace and the daemon", () => {
const manifest = fs.readFileSync(K8S_MANIFEST, "utf8");

// Extract docker-proxy section
const proxyMatch = manifest.match(
/- name: docker-proxy[\s\S]*?(?=\n\s{4}-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/,
);
expect(proxyMatch).not.toBeNull();
const proxySection = proxyMatch[0];

// Proxy is hardened
expect(proxySection).toMatch(/allowPrivilegeEscalation:\s*false/);
expect(proxySection).toMatch(/capabilities:\s*[\r\n]+\s*drop:\s*[\r\n]+\s*-\s*ALL/);
expect(proxySection).toMatch(/seccompProfile:\s*[\r\n]+\s*type:\s*RuntimeDefault/);
expect(proxySection).toMatch(/runAsNonRoot:\s*true/);

// Proxy mounts socket read-only
expect(proxySection).toMatch(/readOnly:\s*true/);

// Dangerous Docker API endpoints are denied
expect(proxySection).toMatch(/name:\s*EXEC[\s\S]*?value:\s*"0"/);
expect(proxySection).toMatch(/name:\s*BUILD[\s\S]*?value:\s*"0"/);

// Workspace does NOT mount the docker-socket volume
const workspaceMatch = manifest.match(
/- name: workspace[\s\S]*?(?=\n\s{4}-\s*name: |\n\s*initContainers:|\n\s*volumes:|$)/,
);
expect(workspaceMatch).not.toBeNull();
const workspaceSection = workspaceMatch[0];
expect(workspaceSection).not.toMatch(/name:\s*docker-socket/);

// Workspace talks to the proxy over TCP, not to the raw socket
expect(workspaceSection).toMatch(/DOCKER_HOST[\s\S]*?value:\s*tcp:\/\//);
expect(workspaceSection).not.toMatch(/DOCKER_HOST[\s\S]*?value:\s*unix:\/\//);
});
});
Loading