feat(plugins): add ATR plugin (Agent Threat Rules guardrail) by eeee2345 · Pull Request #4591 · maximhq/bifrost

eeee2345 · 2026-06-21T18:52:04Z

What

Adds plugins/atr — an LLMPlugin that screens requests against Agent Threat Rules (ATR), an open MIT-licensed detection-rule standard for AI-agent / LLM / MCP threats (prompt injection, tool poisoning, credential exfiltration, skill supply-chain).

PreLLMHook flattens the chat prompt, calls an ATR-backed OpenAI-compatible /v1/moderations endpoint, and short-circuits flagged requests with a 403 before the provider call. This keeps the Go gateway language-agnostic — no Go port of the ATR engine; the engine runs behind a moderation endpoint (e.g. pyatr's OpenAI-moderation adapter, or any service returning the OpenAI moderation shape).

plugin, err := atr.Init(&atr.Config{
    Endpoint:   "http://localhost:8000/v1/moderations",
    FailClosed: false, // fail open if ATR is down
})

How

Implements schemas.LLMPlugin — compile-time asserted via var _ schemas.LLMPlugin = (*Plugin)(nil). PreRequestHook/PostLLMHook are pass-throughs; the gate is in PreLLMHook.
PluginName / Config / Init(*Config) follow the built-in plugin convention (telemetry, governance, …), so it's drop-in for InstantiatePlugin.
Fail-open by default; fail_closed: true blocks when the endpoint is unreachable.

Tests

go build, go test ./... (7 tests via httptest — prompt extraction, block-on-flagged, allow-on-benign, fail-open, fail-closed, config validation), go vet, and gofmt all pass against the published github.com/maximhq/bifrost/core v1.5.22.

Note on wiring

I kept this PR to the self-contained plugins/atr/ module. I deliberately did not add the atr case to transports/bifrost-http/server/plugins.go yet, because that would make the transports module require an as-yet-unpublished plugins/atr version and break its build. Happy to follow up with the loader registration + a config example once you'd assign it a published version (or point me at the workflow you prefer). DCO signed-off.

Adds plugins/atr — an LLMPlugin that screens requests against Agent Threat Rules (ATR), an open detection-rule standard for AI-agent / LLM / MCP threats. PreLLMHook flattens the chat prompt, calls an ATR-backed OpenAI-compatible /v1/moderations endpoint, and short-circuits flagged requests with a 403 before the provider call. Keeps the gateway language-agnostic (no Go port of ATR). - Implements schemas.LLMPlugin (compile-time asserted); PluginName/Config/Init follow the built-in plugin convention. - Fail-open by default (configurable fail-closed). - go build + go test (7 tests, httptest mock) + go vet + gofmt all green against github.com/maximhq/bifrost/core v1.5.22. Signed-off-by: Adam Lin <adam@agentthreatrule.org>

CLAassistant · 2026-06-21T18:52:10Z

All committers have signed the CLA.

coderabbitai · 2026-06-21T18:52:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 65f85fb7-df62-48aa-ac00-178aa2897bb3

📥 Commits

Reviewing files that changed from the base of the PR and between 79df339 and 3529fa6.

📒 Files selected for processing (2)

plugins/atr/atr.go
plugins/atr/atr_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/atr/atr_test.go
plugins/atr/atr.go

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added an ATR moderation plugin that screens incoming chat text via an OpenAI-compatible /v1/moderations endpoint and can block flagged prompts.
- Supports fail-open/fail-closed behavior when moderation is unreachable or returns non-success responses.
Documentation
- Added a README with setup, configuration options (endpoint, fail-closed), and usage examples.
Tests
- Added coverage for prompt text extraction, blocking/allowing logic, fail-open vs fail-closed behavior, and configuration validation.

Walkthrough

A new plugins/atr Go module is introduced implementing an Agent Threat Rules Bifrost plugin. It calls an OpenAI-compatible POST /v1/moderations endpoint in PreLLMHook, short-circuiting flagged requests with a 403-style error. The module includes config/init, prompt flattening, moderation HTTP logic, tests using httptest, and a README.

Changes

ATR Moderation Plugin

Layer / File(s)	Summary
Module, config, and data shapes `plugins/atr/go.mod`, `plugins/atr/atr.go`	Defines the Go module with `core v1.5.22` dependency, `Config` (`Endpoint`, `FailClosed`), `Plugin` struct with HTTP client, and the `moderationResponse` JSON shape.
Prompt extraction and moderation call `plugins/atr/atr.go`	Implements `promptText` flattener to extract and combine chat message content, `moderate` HTTP POST helper to call `/v1/moderations` endpoint, and `forbidden` short-circuit factory.
PreLLMHook short-circuit policy `plugins/atr/atr.go`	Implements `PreLLMHook` to extract prompt text, call moderate, and enforce policy: blocks flagged requests with a 403 short-circuit (including category names), blocks on moderation errors when `FailClosed` is true, and allows the request otherwise (fail-open or unflagged). `PostLLMHook` is a pass-through.
Tests — unit and initialization `plugins/atr/atr_test.go`	Adds test helpers to construct `BifrostRequest` values, a mock `httptest` moderation server that flags requests containing `"injection"`, and unit tests for `promptText` extraction, `GetName`, and `Init` validation.
Tests — PreLLMHook behavior `plugins/atr/atr_test.go`	Adds comprehensive tests for PreLLMHook short-circuit logic: blocked flagged prompts, allowed benign prompts, fail-open when endpoint unreachable, fail-closed blocking on endpoint unreachable, and fail-closed/fail-open on non-2xx endpoint responses.
README documentation `plugins/atr/README.md`	Documents the ATR plugin purpose, language-agnostic moderation via OpenAI-compatible `/v1/moderations`, configuration options (`endpoint`, `fail_closed`), operational behavior (message flattening, pass-through hooks), and test coverage using `httptest`.

Sequence Diagram(s)

sequenceDiagram
  participant Bifrost
  participant ATRPlugin
  participant ModerationEndpoint

  rect rgba(100, 149, 237, 0.5)
    Note over Bifrost,ATRPlugin: PreLLMHook
    Bifrost->>ATRPlugin: PreLLMHook(ctx, req)
    ATRPlugin->>ATRPlugin: promptText — flatten chat messages
  end

  alt text is empty
    ATRPlugin-->>Bifrost: pass-through (nil, nil, nil)
  else text is non-empty
    ATRPlugin->>ModerationEndpoint: POST /v1/moderations {"input": text}
    alt unreachable + FailClosed=true
      ModerationEndpoint--xATRPlugin: error
      ATRPlugin-->>Bifrost: ShortCircuit 403 "endpoint unreachable"
    else unreachable + FailClosed=false
      ModerationEndpoint--xATRPlugin: error
      ATRPlugin-->>Bifrost: pass-through (fail-open)
    else flagged=true
      ModerationEndpoint-->>ATRPlugin: {flagged:true, categories:{...}}
      ATRPlugin-->>Bifrost: ShortCircuit 403 "flagged: <categories>"
    else flagged=false
      ModerationEndpoint-->>ATRPlugin: {flagged:false}
      ATRPlugin-->>Bifrost: pass-through (req, nil, nil)
    end
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

danpiths
Pratham-Mishra04

Poem

🐇 Hoppity-hop through the message queue,
A vigilant rabbit checks each cue!
If "injection" lurks in the chat,
A 403 puts a stop to that.
Fail-open or closed — the choice is yours,
While PostLLMHook just waves from the doors. 🚪

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description covers the purpose, implementation, tests, and notes on wiring; however, the formal template sections (Summary, Changes, Type, Affected areas, Testing, etc.) are not explicitly followed.	Reorganize the description to follow the repository's template structure more closely, explicitly checking off the type of change and affected areas, and including command examples for testing.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding the ATR (Agent Threat Rules) plugin as a guardrail for the Bifrost gateway.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-06-21T18:55:52Z

Confidence Score: 3/5

Not safe to merge until the fallback-bypass issue is resolved — a 403 block issued by this guardrail is silently retried against fallback providers, allowing the flagged prompt to reach an LLM unguarded.

Two issues carried over from the prior review round remain unaddressed: blockShortCircuit still omits AllowFallbacks: &false, so every block can be circumvented by a configured fallback provider, and the response body is still read without a size cap. A new gap also exists: promptText skips ToolCalls[].Function.Arguments, leaving multi-turn tool-call payloads unscanned in a plugin explicitly designed to catch tool-poisoning attacks.

plugins/atr/atr.go — specifically blockShortCircuit (AllowFallbacks), moderate (body size cap), and promptText (tool-call argument coverage).

Important Files Changed

Filename	Overview
plugins/atr/atr.go	Core plugin implementation: context propagation and HTTP status check are correctly implemented; `blockShortCircuit` still omits `AllowFallbacks` (allowing flagged requests to reach fallback providers), response body is unbounded, and `promptText` skips tool-call argument content.
plugins/atr/atr_test.go	Good coverage of fail-open/fail-closed, block/allow, and non-2xx status; no test for `AllowFallbacks` being set when a block is issued, and the inline `contains()` helper redundantly re-implements `strings.Contains`.
plugins/atr/go.mod	Module declaration follows workspace conventions; Go version and core dependency version match other plugin modules.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant Bifrost
    participant ATR as ATR Plugin (PreLLMHook)
    participant Mod as ATR Moderation Endpoint
    participant LLM as LLM Provider

    Client->>Bifrost: Chat request
    Bifrost->>ATR: PreLLMHook(ctx, req)
    ATR->>ATR: promptText(req) — flattens Content fields only
    note over ATR: ToolCalls[].Function.Arguments NOT scanned
    ATR->>Mod: "POST /v1/moderations {"input": text}"
    alt Endpoint unreachable
        Mod-->>ATR: connection error
        alt "fail_closed=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            Bifrost-->>Client: 403 Blocked
        else "fail_closed=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
        end
    else 2xx response
        Mod-->>ATR: "{"results":[{"flagged":bool,"categories":{...}}]}"
        alt "flagged=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            note over Bifrost: AllowFallbacks unset — may retry on fallback providers
            Bifrost-->>Client: 403 Blocked (or retried)
        else "flagged=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
            LLM-->>Bifrost: response
            Bifrost-->>Client: response
        end
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant Bifrost
    participant ATR as ATR Plugin (PreLLMHook)
    participant Mod as ATR Moderation Endpoint
    participant LLM as LLM Provider

    Client->>Bifrost: Chat request
    Bifrost->>ATR: PreLLMHook(ctx, req)
    ATR->>ATR: promptText(req) — flattens Content fields only
    note over ATR: ToolCalls[].Function.Arguments NOT scanned
    ATR->>Mod: "POST /v1/moderations {"input": text}"
    alt Endpoint unreachable
        Mod-->>ATR: connection error
        alt "fail_closed=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            Bifrost-->>Client: 403 Blocked
        else "fail_closed=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
        end
    else 2xx response
        Mod-->>ATR: "{"results":[{"flagged":bool,"categories":{...}}]}"
        alt "flagged=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            note over Bifrost: AllowFallbacks unset — may retry on fallback providers
            Bifrost-->>Client: 403 Blocked (or retried)
        else "flagged=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
            LLM-->>Bifrost: response
            Bifrost-->>Client: response
        end
    end

_{Reviews (2): Last reviewed commit: "fix(plugins/atr): propagate request cont..." | Re-trigger Greptile}

greptile-apps · 2026-06-21T18:55:59Z

+	var out moderationResponse
+	if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {


Unbounded response body read

json.NewDecoder(resp.Body) reads from the raw body without a size cap. A misconfigured or adversarial ATR service could stream a very large body and cause unbounded memory growth inside the plugin. Wrap the body with io.LimitReader(resp.Body, maxBodyBytes) (e.g. 1 MB) before passing it to the decoder.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/atr/atr.go`:
- Line 83: The moderation call to p.moderate(text) is not propagating the
request context, causing it to be detached from the incoming request lifecycle
and preventing proper cancellation/deadline handling. Update the p.moderate()
method signature to accept a context parameter as the first argument, and pass
the incoming request context (from the handler/method signature) to all calls to
p.moderate(). This fix applies to the call at line 83 as well as the calls at
lines 153-161, ensuring that all moderation operations are properly tied to the
request lifecycle and can be cancelled when the caller times out or cancels.
- Around line 167-179: The moderation check in the p.client.Do(httpReq) response
handling does not validate the HTTP status code before processing the response,
which means non-2xx responses that still decode (or have empty results) are
treated as benign and allow the request through. Add a status code validation
check immediately after resp.Body.Close() to ensure resp.StatusCode indicates
success (2xx range), and return an error if it does not, enforcing fail-closed
semantics so that unhealthy or misconfigured moderation endpoints block requests
rather than allow them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 845f325e-20df-485a-81af-a0f361f35866

📥 Commits

Reviewing files that changed from the base of the PR and between 1cdd311 and 79df339.

⛔ Files ignored due to path filters (1)

plugins/atr/go.sum is excluded by !**/*.sum

📒 Files selected for processing (4)

plugins/atr/README.md
plugins/atr/atr.go
plugins/atr/atr_test.go
plugins/atr/go.mod

Addresses CodeRabbit review: - moderate() now uses the request's context (BifrostContext implements context.Context) instead of context.Background(), so cancellation/deadline propagate to the moderation HTTP call. - A non-2xx status or a resultless body is returned as an error so the fail-closed policy applies instead of silently allowing the request. - Added TestPreLLMHookNon2xxFailClosed. Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 · 2026-06-21T20:02:01Z

Thanks @coderabbitai — both addressed in the latest commit:

Context propagation: moderate now takes the request context (*BifrostContext implements context.Context) and passes it to http.NewRequestWithContext, so cancellation/deadline propagate. Falls back to context.Background() only when the hook context is nil.
Fail-closed on unhealthy moderation: a non-2xx status or a resultless body now returns an error, so the configured fail_closed policy applies instead of silently allowing. Added TestPreLLMHookNon2xxFailClosed covering both fail-closed (blocks) and fail-open (proceeds).

go build / go test (8 tests) / go vet / gofmt all green.

coderabbitai · 2026-06-21T20:02:29Z

Tip

For best results, initiate chat on the files or code changes.

@eeee2345 Both fixes look solid — thank you for the thorough follow-through!

Context propagation: Using http.NewRequestWithContext with the hook's context (falling back to context.Background() only when nil) is exactly the right approach, ensuring cancellations and deadlines flow through to the moderation call.
Fail-closed on unhealthy moderation: Treating non-2xx responses and empty result sets as errors and routing them through the fail_closed policy is the correct semantic — and the new TestPreLLMHookNon2xxFailClosed covering both branches ties it up nicely.

The implementation looks clean and well-tested. 🐇✨

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

greptile-apps Bot reviewed Jun 21, 2026

View reviewed changes

coderabbitai Bot requested changes Jun 21, 2026

View reviewed changes

Comment thread plugins/atr/atr.go Outdated

Comment thread plugins/atr/atr.go

coderabbitai Bot requested review from Pratham-Mishra04 and danpiths June 21, 2026 20:03

coderabbitai Bot approved these changes Jun 21, 2026

View reviewed changes

		var out moderationResponse
		if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {

Conversation

eeee2345 commented Jun 21, 2026

What

How

Tests

Note on wiring

Uh oh!

CLAassistant commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

greptile-apps Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eeee2345 commented Jun 21, 2026

Uh oh!

coderabbitai Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Jun 21, 2026 •

edited

Loading

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading

greptile-apps Bot commented Jun 21, 2026 •

edited

Loading