Skip to content

MINOR: Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability#22431

Open
potiuk wants to merge 2 commits into
apache:trunkfrom
potiuk:asf-security/threat-model-2026-05-31
Open

MINOR: Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability#22431
potiuk wants to merge 2 commits into
apache:trunkfrom
potiuk:asf-security/threat-model-2026-05-31

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented May 31, 2026

This is a draft proposal for the Kafka PMC to review — please correct,
reject, or discuss as needed.
Nothing here is a requirement; the
maintainers are the decision-makers, and this describes Kafka as the
PMC says it is
.

This PR adds THREAT_MODEL.md + SECURITY.md + AGENTS.md, wiring
AGENTS.md -> SECURITY.md -> THREAT_MODEL.md.

Framing: Kafka is a configurable platform — it provides mechanisms
(SASL/mTLS auth, an ACL authorizer, TLS, quotas) and the operator
chooses
which listeners use them; a broker can run wide open
(PLAINTEXT, no authorizer) or fully locked down. The untrusted network
client
is the adversary; the operator and trusted cluster peers /
metadata quorum are out of model.

Draft-first, mostly inferred (~16 documented / 0 maintainer / ~58
inferred); every *(inferred)* claim routes to a numbered §14
question. The wave-1 rulings decide VALID-vs-misconfiguration:

  • Is running the default PLAINTEXT listener with no authorizer a
    supported posture (network-trust), so an "unauthenticated broker"
    report against defaults is by-design — or should it be VALID?
  • Under StandardAuthorizer, is the default
    allow.everyone.if.no.acl.found "no ACL ⇒ deny"?
  • Does the Connect REST API require authentication by default, and
    how should connector-config URL handling (SSRF) be treated?

Scope note: this covers the broker + Connect; Kafka Streams is treated
as a client library (in-app trust), and tools/shell/trogdor/tests are
out of the runtime model.

Context: the ASF Security team is preparing the project for an automated
agentic security scan we're piloting. Drafted via the
threat-model-producer
rubric. If you'd rather author it yourselves, close this PR and we'll
regroup.

Reviewers: Christo Lolov lolovc@amazon.com

…l discoverability

Adds a draft (v0) threat model plus SECURITY.md and AGENTS.md so an automated
scan agent can discover the model via AGENTS.md -> SECURITY.md -> THREAT_MODEL.md.
The model is a proposal for the PMC to review; most claims are (inferred) and
route to open questions in its section 14.

Generated-by: Claude Code (Claude Opus 4.8)
@github-actions github-actions Bot added the docs label May 31, 2026
Copy link
Copy Markdown
Contributor

@clolov clolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. I have tried providing answers to the questions I have answers to and pulled in others who I think are better-suited at answering the ones I don't.

Comment thread THREAT_MODEL.md
## §14 Open questions for the maintainers

**Wave 1 — the default-posture rulings (decide VALID-vs-misconfig; §5a/§8/§9):**
1. Is running a broker with the **default PLAINTEXT listener and no authorizer** a *supported* posture (relying
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that a "PLAINTEXT listener and no authorizer" is valid only for development.

Comment thread THREAT_MODEL.md
1. Is running a broker with the **default PLAINTEXT listener and no authorizer** a *supported* posture (relying
on network controls), so an "unauthenticated broker" report against defaults is `BY-DESIGN` — or should it
be `VALID`? *Proposed:* operator must secure before exposing; open default is dev-only.
2. With the StandardAuthorizer, what is the default of **`allow.everyone.if.no.acl.found`**, and is "no ACL ⇒
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread THREAT_MODEL.md
**Wave 2 — auth/authz mechanics (§8):**
4. Which **SASL mechanisms** are recommended/discouraged by default, and does the broker enforce TLS for
credential-exposing mechanisms (PLAIN)? *Proposed:* SCRAM/GSSAPI/OAUTHBEARER recommended; PLAIN requires TLS.
5. Are **delegation tokens** and idempotent/transactional state gated by ACLs the same as normal operations?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idempotent producers and transactions are gated by ACLs (source: https://kafka.apache.org/43/security/authorization-and-acls/#operations-and-resources-on-protocols)

Delegation tokens are a mix of both ACLs and additional checks specifically for tokens i.e. something authenticated with a token cannot create another token. For the purposes of this the answer is yes.

Comment thread THREAT_MODEL.md
operator-trusted configs is out of model, but an unauthenticated REST API is the real exposure.

**Wave 2 — auth/authz mechanics (§8):**
4. Which **SASL mechanisms** are recommended/discouraged by default, and does the broker enforce TLS for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread THREAT_MODEL.md
**Wave 3 — DoS, peers, §11a (§7/§8/§11a):**
6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?
*Proposed:* `socket.request.max.bytes` + quotas bound it; beyond that, operator config.
7. Confirm **cluster peers / the KRaft quorum / ZooKeeper** are trusted (out of §7). *Proposed:* yes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's trust cluster peers and KRaft quorum for now. Let's remove references to Apache ZooKeeper. Technically versions 3.9.x still have ZooKeeper, but it isn't on trunk.

Comment thread THREAT_MODEL.md
6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?
*Proposed:* `socket.request.max.bytes` + quotas bound it; beyond that, operator config.
7. Confirm **cluster peers / the KRaft quorum / ZooKeeper** are trusted (out of §7). *Proposed:* yes.
8. What do scanners most often (re)report that the PMC considers a **non-finding**? (Seeds §11a.)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimaison maybe you know some examples? Or @showuon?

Comment thread THREAT_MODEL.md
8. What do scanners most often (re)report that the PMC considers a **non-finding**? (Seeds §11a.)

**Meta:**
9. Confirm this model lives as root `THREAT_MODEL.md` referenced from a new `SECURITY.md`, covering the broker
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the THREAT_MODEL.md makes it in as a new page under docs/security, but I am happy to move it as a subsequent step if that formatting will somehow break things. I think it is fair to treat Streams as client library for now (@mjsax maybe you can weigh in here?). Given that we expose REST APIs from Connect I have a feeling we need to go in a bit more depth (@mimaison thoughts?).

Comment thread THREAT_MODEL.md
be `VALID`? *Proposed:* operator must secure before exposing; open default is dev-only.
2. With the StandardAuthorizer, what is the default of **`allow.everyone.if.no.acl.found`**, and is "no ACL ⇒
deny" the intended secured behavior? *Proposed:* deny by default under StandardAuthorizer.
3. Does the **Connect REST API** require authentication by default, and is connector-config URL handling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimaison you may be the best-suited person to answer this question? If not maybe you know who might be?

Comment thread THREAT_MODEL.md
*Proposed:* yes.

**Wave 3 — DoS, peers, §11a (§7/§8/§11a):**
6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configurations which protect against a DoS and their default values are:
socket.request.max.bytes at 100 MiB
queued.max.requests at 500
connection.failed.authentication.delay.ms at 100 ms

We further have the following unset by default ones:
queued.max.request.bytes
max.connections{.per.ip}
max.connection.creation.rate

By default quotas are unset. They can be set on a produce/consume level, on a generic request level, or on the number of mutations processable by the controller.

Comment thread THREAT_MODEL.md
| Metadata control plane | KRaft quorum (`raft`, `metadata`) / ZooKeeper (legacy) | network | **Yes (peer-trust)** |
| Coordinators | group / transaction / share coordinators | — | **Yes** |
| Storage + tiered storage | log segments; remote-storage plugins | filesystem; remote store | **Yes** |
| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C)** |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is addendum C?

Comment thread THREAT_MODEL.md
an untrusted REST caller (if the REST API is unauthenticated) is the real finding.
- **Findings in `tools`, `shell`, `trogdor`, `tests`, `docker`, samples** — out of scope (§3).
- **Streams application-level issues** — out of the broker model (§3).
- **Idempotent-producer / replication internals** not reachable from an unauthorized client — out of surface.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that an idempotent producer is grouped with the replication internals? Is the idea here that Kafka has some internal state (i.e. for idempotent producer or for replication) which lives on brokers and is not exposed?

Comment thread THREAT_MODEL.md
| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C)** |
| Kafka Streams | client library (runs in the app) | — | Light → §3 |
| Clients library | parses broker responses | — | **Yes (client-side)** |
| tools / shell / trogdor / tests / docker | — | — | No → §3 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether you need an exhaustive list or just an list of examples, but let's also exclude committer-tools

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And bin. bin contains the .sh files which we use to start a broker, for example, but if we are assuming that this is out of scope due to it being a responsibility of the operator I don't think we need to look into it either

@potiuk potiuk changed the title Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability MINOR: Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants