agent6 treats the LLM as untrusted. This document is the layer-by-layer breakdown of how that assumption is enforced and what the known limits are.
For now: open a GitHub issue prefixed [security], or, for embargoed
issues, email the maintainer listed in pyproject.toml. Once agent6 has
a stable distribution, this will move to a private channel + GitHub
Security Advisories.
Please include:
- agent6 version (
agent6 --version). - Linux kernel version + distro (
uname -a+/etc/os-release). - The output of
agent6 check sandbox. - Minimal reproduction, ideally as a failing test under
tests/security/.
Adversary: a fully malicious worker model, or an honest model that has been prompt-injected by a file in the repository under analysis.
We assume the adversary controls:
- the text of every LLM response,
- the choice of tool calls and their arguments (within the published JSON schema),
- the content of any file the agent reads during the run.
We do NOT assume the adversary controls:
- the kernel,
- the agent6 source code (i.e., supply-chain compromise of the agent6 binary itself is out of scope; sign your releases),
- the Anthropic / OpenAI API endpoints.
Under that adversary, agent6 aims to make the following true:
- No writes outside the project working directory.
- No reads of files outside the project working directory (+ read-only sibling paths, if explicitly configured).
- No network egress except to the configured provider endpoints. This is
enforced structurally for the agent process when
sandbox.agent_network = "providers"(the default; see Defense Layer 1b);"local"narrows it to loopback providers,"open"lifts it. Jailed commands (run_command, machinetoolstates) are governed separately bysandbox.tool_network(default"block"); see Defense Layer 1b and §8. - No
git push, no--force, no history rewrite, noreset --hard. - No persistence after the run terminates (no daemon, no cron, no
.bashrcmutation; the jail's mount namespace is the only place children can write to anyway).
Applied at the start of agent6 run/resume, before any provider or
network object is built, on the hardened profile only. The strict
profile does not take this path: it runs every child command in its own
user+mount+pid+net namespace (a strictly stronger boundary than Landlocking
the parent) and confines provider egress with the broker (Defense Layer
1b); applying agent-process Landlock there would additionally break the
jail's pivot_root(2)/mount(2) on kernels at Landlock ABI ≥ 7. Where it
applies it restricts the Python process itself (irrevocably, inherited by
every child it spawns):
| Landlock rule | Allowed |
|---|---|
| FS read | cwd, $HOME, /usr, /etc, /tmp, the common /dev char devices, and /run + /proc when present |
| FS write | cwd, /tmp, the /dev char devices, and /proc when present |
| TCP connect (kernel ≥ 6.7) | the ports of configured providers: 443 for each anthropic entry, the base_url port for each openai entry (default 443) |
Landlock's network hook filters by destination port only (it has no
host/IP primitive), so it blocks connections on other ports but does not
pin egress to a specific host; for that use sandbox.agent_network = "providers"
(Defense Layer 1b). On older kernels (no TCP rules) agent6 warns and runs
FS-only Landlock; don't run there on a host whose UID can read credentials the
agent could exfiltrate.
When enabled (strict profile only; it relies on unprivileged user
namespaces), agent6 run confines its own process to host-level egress:
- While still in the host network namespace (netns) and single-threaded, the
agent binds one
AF_UNIXlistening socket per allow-listed providerhost:portandfork()s a small broker child. The broker stays in the host netns; for each connection accepted on a given socket it dials the single fixedhost:portthat socket represents (resolved per-connect, so the allow-list is robust to CDN IP rotation) and splices bytes. TLS is end-to-end: the broker only ever sees ciphertext. - The agent then
unshare(CLONE_NEWUSER | CLONE_NEWNET)into a fresh, empty network namespace (loopback only: no veth, no default route). Its sole path off-host is the set of unix sockets, each of which is hard-wired to one provider endpoint chosen by the operator, never by the (untrusted) LLM at connect time.
This is fail-closed: the kernel network namespace is the real boundary. A missing route means no connectivity (the agent cannot connect at all), never a silent leak. Because the upstream of each socket is fixed at bind time, the egress allow-list is structural rather than a filter the agent could be tricked into widening. On hosts that only support the hardened profile agent6 refuses to run rather than execute unconfined.
sandbox.agent_network = "local" uses the same broker but pins it to loopback
provider endpoints only (local models such as Ollama) and refuses a non-local
provider; sandbox.agent_network = "open" skips the broker entirely. For agent6 machine run, each agent state runs in its own subprocess that performs this
same broker setup for itself (the engine is a thin host-netns supervisor), so a
machine agent's egress is confined exactly as a normal run's is.
Curator and other AF_UNIX-based helpers are unaffected (unix sockets
cross the netns boundary). MCP servers that need their own outbound
network access will not have it under providers; that is a
deliberate limitation, not a bug.
sandbox.allow_urls (operator-controlled egress additions). The
allow-list above is, by default, exactly the configured provider
endpoints. An operator may widen it with sandbox.allow_urls: a set of
host / host:port / URL entries that get their own broker sockets
alongside the providers (effective egress = union of provider endpoints
and allow_urls). Security properties are unchanged: each added socket is
still hard-wired at bind time to one operator-chosen host:port, resolved
per-connect, and the LLM cannot add, widen, or redirect an entry: it is a
static config field, never written from model output. The default is empty
(secure by default), entries are validated at config-load time, and the
field is only consulted under sandbox.agent_network = "providers" (ignored
under local/open). It widens only the agent path, never a jailed command.
Merge is last-overlay-wins: the most-specific config tier that sets
allow_urls replaces it wholesale, so a repo or machine overlay cannot
silently append to a narrower global allow-list; it must restate the
full set, keeping the effective allow-list auditable via config show.
Every apply_edit is in-process, but every run_verify_command and
run_command is executed by agent6-jail. The jail:
- Forks a new user, mount, PID, IPC, UTS, and (in
strict) network namespace. - Sets up a minimal rootfs of bind mounts under a fresh tmpfs and
pivot_roots into it. The working directory is the only writable mount; everything else isro,nosuid,nodev. - Bind-mounts a curated subset of
/dev:null,zero,urandom,random,full./dev/ttyis not exposed: TTY access lets a child write escape sequences to the controlling terminal of the parent. - Mounts a fresh
proc(private to the new PID namespace). If that fails on the host kernel,/procis left empty inside the jail rather than bind-mounting the host/proc; the latter would expose host process info to the child. - Applies Landlock (FS + net rules).
- Installs a seccomp filter that allows the syscalls a Linux process actually needs (clone, mmap, futex, …) and blocks the dangerous remainder (kexec, bpf, ptrace, mount, …).
- Drops all capabilities, sets
NO_NEW_PRIVS, and applies rlimits (CPU, AS, NOFILE, NPROC). - Then
execves the requested binary.
The jail's policy is passed as a JSON document on stdin from
agent6.sandbox.jail.run_in_jail. The Rust side validates it against a
strict schema and refuses on any unknown field.
You set the sandbox.profile field; it resolves against the host to an
effective profile: what actually runs. There is no none value (you cannot
ask for it) and no auto effective profile (it is resolved away). No silent
downgrade: an explicit request the host can't satisfy is refused, not weakened.
sandbox.profile |
Host | Effective profile |
|---|---|---|
auto (default) |
Linux, user namespaces available | strict |
auto |
Linux, no user namespaces | hardened |
auto |
non-Linux (macOS / Windows) | none |
strict |
Linux, user namespaces available | strict |
strict |
anything else | ⛔ refuse to run |
hardened |
Linux (user namespaces or not) | hardened |
hardened |
non-Linux (macOS / Windows) | ⛔ refuse to run |
The three effective profiles:
- strict: full namespaces (user/mount/pid/ipc/uts/net) +
pivot_root+ Landlock + seccomp +capset(0)+ rlimits +NO_NEW_PRIVS. - hardened: Landlock + seccomp +
capset(0)+ rlimits +NO_NEW_PRIVS, but no namespaces (so it works inside default-seccomp Docker, where the container blocks the innerclone(CLONE_NEW*); the container is the blast radius). - none: unsandboxed. Child commands run as plain subprocesses with no kernel-enforced confinement; always with a loud warning.
CI should set profile = "strict" to fail loudly if the sandbox is weaker than
expected. "User namespaces available" means unshare -U -r true succeeds.
The LLM only ever sees the fixed set declared in
src/agent6/tools/schema.py (enumerated in the README): structured
edits, read-only navigation, fixed-argv verify/metric commands, a
terminal finish_run, a curator-backed task notepad, and the
capability-gated run_command. There is no shell, no write_file
(writes go through apply_edit, an in-process rewriter that refuses
paths outside cwd), no web_fetch, and no eval. Adding a tool requires
a security review note in the commit message; see AGENTS.md.
src/agent6/git_ops.py is the only module that invokes git. It
exposes typed wrappers for the safe operations (status, add, commit,
diff, branch creation, checkout) and refuses, by construction, to call:
git push(any form, any remote),git reset --hard,git commit --amend,git rebase,git filter-branch,git filter-repo,git branch -D,git branch --force,- anything containing
--forceor-fon a destructive verb.
git.allow_push, git.allow_force, and git.allow_history_rewrite in
the agent6 config exist for forward compatibility but are currently
ignored; they will stay ignored until there is a concrete review of what
a "safe push" would look like.
Every git_ops invocation is also hardened against repo-controlled host
code execution: a cloned/poisoned .git/config can otherwise run a
command on the host (outside the jail) the moment agent6 runs git in it.
core.fsmonitor (fires on index refresh) and diff.external (fires on
git diff) are always overridden off; the repo's .git/hooks/* run only
when git.run_repo_hooks = true (default false; core.hooksPath is
pointed away from the repo so a pre-commit hook can't fire on agent6's
own auto-commit). On strict this complements protect_git, which
RO-binds .git to stop the worker from writing into it. On hardened
the cwd is blanket read-write (no mount namespace to carve, and carving
.git read-only would also deny new top-level entries and break
toolchains like cargo/pytest that create target/ or .pytest_cache/),
so .git is writable by jailed commands there. That is acceptable: it is
gated by run_commands (default ask), recoverable (branch-per-run,
commits go through git_ops), and the surrounding container is the blast
radius.
- Secrets at rest. Provider API keys live in
$XDG_CONFIG_HOME/agent6/secrets.toml, created and enforced0600(owner read/write only). agent6 refuses to read the file if it is group/other-accessible or owned by another user, the same posture as an SSH private key. Keys may alternatively come from an environment variable named by[providers.<name>].api_key_env; the env var takes precedence. Keys are never written to transcripts, never printed byagent6 config show(redacted), and never mounted into the jail; provider calls happen in agent6's own process, outside the sandbox. agent6 connectnever executes remote input. The connect flow only prompts locally (key viagetpass: hidden, or masked with*on Python 3.14+) and writes config/secrets. It does not run any command, URL, or script returned by a provider or any remote, by construction. This is a deliberate guard against the class of bug where a login flow runs an attacker-supplied shell command. agent6 also opens no listening network socket of any kind (MCP is stdio; the egress broker is a private Unix socket).- Root. Running an LLM-driven agent as root is dangerous, so agent6
refuses unless the operator explicitly opts in with
--allow-root(orAGENT6_ALLOW_ROOT=1), and prints a loud banner. When invoked throughsudo, agent6 resolves the real user fromSUDO_UID/SUDO_GID/SUDO_USER, reads that user's config + secrets (not root's), andchowns anything it writes under the per-repo state dir back to them so no root-owned files are left behind. agent6 does not drop privileges in-process: undersudothe worker's verify/run commands are expected to need root and run as root inside the jail, so the jail, not the process uid, is the security boundary.
The task graph is written by a separate agent6-curator subprocess. The
main agent process talks to it over a Unix domain socket inside the run
directory and never writes graph data directly, so a bug in the worker /
planner cannot corrupt the on-disk graph; the curator's append-only
graph.jsonl is the durable source of truth. The main process writes the
rest of the run state in-process: the resume snapshot (loop_state.json),
the event log (logs.jsonl), and transcripts.
What keeps the whole run directory safe from jailed commands is its
location, not any single writer. Per-repo state (config + run state)
lives out of the workspace under $XDG_STATE_HOME/agent6/<repo-id>/
(override with [agent6].state_dir or AGENT6_STATE_HOME). Jailed
commands run on the repo cwd, and the state dir is outside it, so they
cannot reach it.
agent6 does not run an HTTP server, gRPC server, or any other accept-side socket. The only sockets it opens are:
- outbound HTTPS to the LLM provider,
- a per-run Unix domain socket under the run directory
(
<state-dir>/<repo-id>/runs/<run-id>/) with mode0600for talking to its own curator.
There is no telemetry, no auto-update, and no remote control plane.
A agent6 machine run engine is a thin supervisor that stays in the host
network namespace and makes no network calls itself. Each agent state runs in
its own subprocess that confines its egress per sandbox.agent_network (the
broker, §1b); each tool state is jailed by the engine, so a per-tool
allow_network decides its netns independently of the agent. This is what lets
a machine confine its agents to the provider API while letting one
deterministic tool reach the network: a tool command is fixed and
operator-reviewed (unlike run_command, whose argv the LLM chooses), so a
networked tool is not a free exfiltration channel.
Egress is set by sandbox.agent_network, sandbox.tool_network, and a per-tool
allow_network; the effective profile (§3) decides what is enforceable. The
tables cover every case; "offline" = no egress.
Agent-process egress (the agent's own LLM/provider HTTP), by sandbox.agent_network:
sandbox.agent_network |
strict |
hardened |
none |
|---|---|---|---|
providers (def) |
provider endpoints + allow_urls, broker-pinned (§1b) |
provider ports only (Landlock) | unconfined ⚠ |
local |
loopback providers only, broker-pinned (refuse to run if any provider isn't loopback) | ⛔ refuse to run | unconfined ⚠ |
open |
unconfined | unconfined | unconfined ⚠ |
Jailed-command egress (run_command and machine tool states), by
sandbox.tool_network (columns; cells are the strict profile):
| jailed command | block (def) |
only_explicit_states |
allow |
|---|---|---|---|
run_command |
offline | offline | host network |
tool, allow_network = "auto" (def) / "block" |
offline | offline | offline |
tool, allow_network = "allow" |
⛔ refuse to run | host network | host network |
Refusals: these configurations refuse to run (fail-closed):
| Configuration | When |
|---|---|
sandbox.tool_network = "allow" without sandbox.agent_network = "open" |
config load, any profile ¹ |
a tool sets allow_network = "allow" under sandbox.tool_network = "block" |
machine start, any profile |
sandbox.agent_network = "local" or sandbox.tool_network = "only_explicit_states" |
run start, hardened ² |
a machine with tool states under sandbox.tool_network = "block", or a tool with allow_network = "block" |
machine start, hardened ² |
- ⚠
none(non-Linux) is unsandboxed: nothing above is enforced and nothing is refused; the run proceeds with a loud warning. - ¹
run_commandruns inside the agent process, so it can't reach the network while the agent is confined; hencesandbox.tool_network = "allow"needssandbox.agent_network = "open". - ²
sandbox.tool_network's per-command isolation needs a network namespace, so it isstrict-only. Onhardened(no namespaces) a jailed child instead inherits the agent's Landlock and followssandbox.agent_network; the cases that would need real per-command isolation are refused rather than mis-confined.
Every surface fails closed:
- Operator-gated, machine-declared.
sandbox.agent_network/sandbox.tool_networkare read only from the operator's global/repo config; a machine's[config]overlay (possibly LLM-drafted or shared) is rejected at load if it declares[providers.*]or[sandbox.*]. Atoolmerely declaresallow_network; whether"allow"is honored is the operator's call viasandbox.tool_network, and every conflict or unenforceable demand is refused at startup naming the state (see the rows/notes above), never silently mis-confined. - Bundle confinement. Helper scripts live in an operator-reviewed
scripts/directory beside the.asm.toml.machine checkvalidates that every entry underscripts/resolves inside the bundle (symlinks that escape via../absolute are rejected) and that every staticscripts/...command reference exists and stays inside the bundle, so a machine cannot smuggle a path that reads or executes outside its own directory. Scripts are drafted at authoring time and reviewed/committed by the operator, never fetched or generated from untrusted model output at run time. And during a run, the machine's own.asm.toml+scripts/are made read-only in every jail (the same mechanism that RO-binds.giton strict), so a tool or agent state cannot rewrite its own logic, add anallow_networkflag, or alter a bundled script mid-run or for a future run.
The test suite under tests/security/test_prompt_injection.py
runs a small corpus of adversarial inputs through the planner, worker,
and reviewer prompts and asserts that the agent does not exfiltrate
file content, does not attempt out-of-policy tool calls, and does not
follow embedded instructions to weaken its own constraints.
This is a smoke test, not a proof. The structural defenses above (sandbox, fixed tool surface, git invariants) are the real mitigation; prompt-injection corpus tests exist to catch regressions in the prompts, not to bound what an attacker can do.
- Landlock TCP rules require Linux ≥ 6.7 (ABI ≥ 4). On older kernels
the agent process itself is not network-confined. Children are still
net-isolated in
strictvia the empty network namespace. - User namespaces must be enabled
(
kernel.unprivileged_userns_clone = 1). Some distros disable this by default; agent6 detects that and refuses to runstrict. - AppArmor userns restriction (Ubuntu 24.04+:
kernel.apparmor_restrict_unprivileged_userns = 1) blocks unprivileged userns unless the process has an AppArmor profile grantinguserns. agent6 ships such a profile, scoped to just the launcher binary, in packaging/apparmor/agent6-jail, the surgical fix (preferred over disabling the sysctl host-wide). agent6's profile detection probes the real launcher binary (not/usr/bin/unshare), so once the profile is installed it correctly selectsstrict; without it, it useshardenedandagent6 check sandboxprints how to enablestrict. - seccomp is required by the jail; on rare hardened kernels that block seccomp from unprivileged callers, the jail fails closed.
- Devcontainers: the jail's
hardenedprofile is what you get inside Docker / VS Code dev containers. The container itself becomes the FS blast radius. Network restrictions still apply via the agent-process Landlock when the kernel supports it. The XDG state base is inside the container and ephemeral (lost on rebuild), so to persist run state mount a volume at the state dir or set[agent6].state_dir/AGENT6_STATE_HOMEto a persisted out-of-cwd path. - Side channels: agent6 makes no claim about timing, cache, or speculative-execution side channels. If your threat model includes Spectre-class attacks, do not co-locate agent6 on a host with secrets.
- Supply chain: pin your install. The runtime deps are
pydantic,httpx,argcomplete, thetree-sitterpair (tree-sitter+tree-sitter-language-pack), andtextual(the live dashboard); build-dep ishatchling; and the jail pulls a small set of well-known Rust crates (nix,libc,landlock,seccompiler,serde,serde_json). Verify before upgrading any of them.