Skip to content

feat(plugins): add ATR plugin (Agent Threat Rules guardrail)#4591

Open
eeee2345 wants to merge 2 commits into
maximhq:devfrom
eeee2345:feat/atr-plugin
Open

feat(plugins): add ATR plugin (Agent Threat Rules guardrail)#4591
eeee2345 wants to merge 2 commits into
maximhq:devfrom
eeee2345:feat/atr-plugin

Conversation

@eeee2345

Copy link
Copy Markdown

What

Adds plugins/atr — an LLMPlugin that screens requests against Agent Threat Rules (ATR), an open MIT-licensed detection-rule standard for AI-agent / LLM / MCP threats (prompt injection, tool poisoning, credential exfiltration, skill supply-chain).

PreLLMHook flattens the chat prompt, calls an ATR-backed OpenAI-compatible /v1/moderations endpoint, and short-circuits flagged requests with a 403 before the provider call. This keeps the Go gateway language-agnostic — no Go port of the ATR engine; the engine runs behind a moderation endpoint (e.g. pyatr's OpenAI-moderation adapter, or any service returning the OpenAI moderation shape).

plugin, err := atr.Init(&atr.Config{
    Endpoint:   "http://localhost:8000/v1/moderations",
    FailClosed: false, // fail open if ATR is down
})

How

  • Implements schemas.LLMPlugin — compile-time asserted via var _ schemas.LLMPlugin = (*Plugin)(nil). PreRequestHook/PostLLMHook are pass-throughs; the gate is in PreLLMHook.
  • PluginName / Config / Init(*Config) follow the built-in plugin convention (telemetry, governance, …), so it's drop-in for InstantiatePlugin.
  • Fail-open by default; fail_closed: true blocks when the endpoint is unreachable.

Tests

go build, go test ./... (7 tests via httptest — prompt extraction, block-on-flagged, allow-on-benign, fail-open, fail-closed, config validation), go vet, and gofmt all pass against the published github.com/maximhq/bifrost/core v1.5.22.

Note on wiring

I kept this PR to the self-contained plugins/atr/ module. I deliberately did not add the atr case to transports/bifrost-http/server/plugins.go yet, because that would make the transports module require an as-yet-unpublished plugins/atr version and break its build. Happy to follow up with the loader registration + a config example once you'd assign it a published version (or point me at the workflow you prefer). DCO signed-off.

Adds plugins/atr — an LLMPlugin that screens requests against Agent Threat
Rules (ATR), an open detection-rule standard for AI-agent / LLM / MCP threats.
PreLLMHook flattens the chat prompt, calls an ATR-backed OpenAI-compatible
/v1/moderations endpoint, and short-circuits flagged requests with a 403 before
the provider call. Keeps the gateway language-agnostic (no Go port of ATR).

- Implements schemas.LLMPlugin (compile-time asserted); PluginName/Config/Init
  follow the built-in plugin convention.
- Fail-open by default (configurable fail-closed).
- go build + go test (7 tests, httptest mock) + go vet + gofmt all green against
  github.com/maximhq/bifrost/core v1.5.22.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@CLAassistant

CLAassistant commented Jun 21, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 65f85fb7-df62-48aa-ac00-178aa2897bb3

📥 Commits

Reviewing files that changed from the base of the PR and between 79df339 and 3529fa6.

📒 Files selected for processing (2)
  • plugins/atr/atr.go
  • plugins/atr/atr_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/atr/atr_test.go
  • plugins/atr/atr.go

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added an ATR moderation plugin that screens incoming chat text via an OpenAI-compatible /v1/moderations endpoint and can block flagged prompts.
    • Supports fail-open/fail-closed behavior when moderation is unreachable or returns non-success responses.
  • Documentation

    • Added a README with setup, configuration options (endpoint, fail-closed), and usage examples.
  • Tests

    • Added coverage for prompt text extraction, blocking/allowing logic, fail-open vs fail-closed behavior, and configuration validation.

Walkthrough

A new plugins/atr Go module is introduced implementing an Agent Threat Rules Bifrost plugin. It calls an OpenAI-compatible POST /v1/moderations endpoint in PreLLMHook, short-circuiting flagged requests with a 403-style error. The module includes config/init, prompt flattening, moderation HTTP logic, tests using httptest, and a README.

Changes

ATR Moderation Plugin

Layer / File(s) Summary
Module, config, and data shapes
plugins/atr/go.mod, plugins/atr/atr.go
Defines the Go module with core v1.5.22 dependency, Config (Endpoint, FailClosed), Plugin struct with HTTP client, and the moderationResponse JSON shape.
Prompt extraction and moderation call
plugins/atr/atr.go
Implements promptText flattener to extract and combine chat message content, moderate HTTP POST helper to call /v1/moderations endpoint, and forbidden short-circuit factory.
PreLLMHook short-circuit policy
plugins/atr/atr.go
Implements PreLLMHook to extract prompt text, call moderate, and enforce policy: blocks flagged requests with a 403 short-circuit (including category names), blocks on moderation errors when FailClosed is true, and allows the request otherwise (fail-open or unflagged). PostLLMHook is a pass-through.
Tests — unit and initialization
plugins/atr/atr_test.go
Adds test helpers to construct BifrostRequest values, a mock httptest moderation server that flags requests containing "injection", and unit tests for promptText extraction, GetName, and Init validation.
Tests — PreLLMHook behavior
plugins/atr/atr_test.go
Adds comprehensive tests for PreLLMHook short-circuit logic: blocked flagged prompts, allowed benign prompts, fail-open when endpoint unreachable, fail-closed blocking on endpoint unreachable, and fail-closed/fail-open on non-2xx endpoint responses.
README documentation
plugins/atr/README.md
Documents the ATR plugin purpose, language-agnostic moderation via OpenAI-compatible /v1/moderations, configuration options (endpoint, fail_closed), operational behavior (message flattening, pass-through hooks), and test coverage using httptest.

Sequence Diagram(s)

sequenceDiagram
  participant Bifrost
  participant ATRPlugin
  participant ModerationEndpoint

  rect rgba(100, 149, 237, 0.5)
    Note over Bifrost,ATRPlugin: PreLLMHook
    Bifrost->>ATRPlugin: PreLLMHook(ctx, req)
    ATRPlugin->>ATRPlugin: promptText — flatten chat messages
  end

  alt text is empty
    ATRPlugin-->>Bifrost: pass-through (nil, nil, nil)
  else text is non-empty
    ATRPlugin->>ModerationEndpoint: POST /v1/moderations {"input": text}
    alt unreachable + FailClosed=true
      ModerationEndpoint--xATRPlugin: error
      ATRPlugin-->>Bifrost: ShortCircuit 403 "endpoint unreachable"
    else unreachable + FailClosed=false
      ModerationEndpoint--xATRPlugin: error
      ATRPlugin-->>Bifrost: pass-through (fail-open)
    else flagged=true
      ModerationEndpoint-->>ATRPlugin: {flagged:true, categories:{...}}
      ATRPlugin-->>Bifrost: ShortCircuit 403 "flagged: <categories>"
    else flagged=false
      ModerationEndpoint-->>ATRPlugin: {flagged:false}
      ATRPlugin-->>Bifrost: pass-through (req, nil, nil)
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • danpiths
  • Pratham-Mishra04

Poem

🐇 Hoppity-hop through the message queue,
A vigilant rabbit checks each cue!
If "injection" lurks in the chat,
A 403 puts a stop to that.
Fail-open or closed — the choice is yours,
While PostLLMHook just waves from the doors. 🚪

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the purpose, implementation, tests, and notes on wiring; however, the formal template sections (Summary, Changes, Type, Affected areas, Testing, etc.) are not explicitly followed. Reorganize the description to follow the repository's template structure more closely, explicitly checking off the type of change and affected areas, and including command examples for testing.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding the ATR (Agent Threat Rules) plugin as a guardrail for the Bifrost gateway.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Confidence Score: 3/5

Not safe to merge until the fallback-bypass issue is resolved — a 403 block issued by this guardrail is silently retried against fallback providers, allowing the flagged prompt to reach an LLM unguarded.

Two issues carried over from the prior review round remain unaddressed: blockShortCircuit still omits AllowFallbacks: &false, so every block can be circumvented by a configured fallback provider, and the response body is still read without a size cap. A new gap also exists: promptText skips ToolCalls[].Function.Arguments, leaving multi-turn tool-call payloads unscanned in a plugin explicitly designed to catch tool-poisoning attacks.

plugins/atr/atr.go — specifically blockShortCircuit (AllowFallbacks), moderate (body size cap), and promptText (tool-call argument coverage).

Important Files Changed

Filename Overview
plugins/atr/atr.go Core plugin implementation: context propagation and HTTP status check are correctly implemented; blockShortCircuit still omits AllowFallbacks (allowing flagged requests to reach fallback providers), response body is unbounded, and promptText skips tool-call argument content.
plugins/atr/atr_test.go Good coverage of fail-open/fail-closed, block/allow, and non-2xx status; no test for AllowFallbacks being set when a block is issued, and the inline contains() helper redundantly re-implements strings.Contains.
plugins/atr/go.mod Module declaration follows workspace conventions; Go version and core dependency version match other plugin modules.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant Bifrost
    participant ATR as ATR Plugin (PreLLMHook)
    participant Mod as ATR Moderation Endpoint
    participant LLM as LLM Provider

    Client->>Bifrost: Chat request
    Bifrost->>ATR: PreLLMHook(ctx, req)
    ATR->>ATR: promptText(req) — flattens Content fields only
    note over ATR: ToolCalls[].Function.Arguments NOT scanned
    ATR->>Mod: "POST /v1/moderations {"input": text}"
    alt Endpoint unreachable
        Mod-->>ATR: connection error
        alt "fail_closed=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            Bifrost-->>Client: 403 Blocked
        else "fail_closed=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
        end
    else 2xx response
        Mod-->>ATR: "{"results":[{"flagged":bool,"categories":{...}}]}"
        alt "flagged=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            note over Bifrost: AllowFallbacks unset — may retry on fallback providers
            Bifrost-->>Client: 403 Blocked (or retried)
        else "flagged=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
            LLM-->>Bifrost: response
            Bifrost-->>Client: response
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant Bifrost
    participant ATR as ATR Plugin (PreLLMHook)
    participant Mod as ATR Moderation Endpoint
    participant LLM as LLM Provider

    Client->>Bifrost: Chat request
    Bifrost->>ATR: PreLLMHook(ctx, req)
    ATR->>ATR: promptText(req) — flattens Content fields only
    note over ATR: ToolCalls[].Function.Arguments NOT scanned
    ATR->>Mod: "POST /v1/moderations {"input": text}"
    alt Endpoint unreachable
        Mod-->>ATR: connection error
        alt "fail_closed=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            Bifrost-->>Client: 403 Blocked
        else "fail_closed=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
        end
    else 2xx response
        Mod-->>ATR: "{"results":[{"flagged":bool,"categories":{...}}]}"
        alt "flagged=true"
            ATR-->>Bifrost: LLMPluginShortCircuit (403)
            note over Bifrost: AllowFallbacks unset — may retry on fallback providers
            Bifrost-->>Client: 403 Blocked (or retried)
        else "flagged=false"
            ATR-->>Bifrost: pass-through
            Bifrost->>LLM: forwarded request
            LLM-->>Bifrost: response
            Bifrost-->>Client: response
        end
    end
Loading

Reviews (2): Last reviewed commit: "fix(plugins/atr): propagate request cont..." | Re-trigger Greptile

Comment thread plugins/atr/atr.go
Comment thread plugins/atr/atr.go Outdated
Comment thread plugins/atr/atr.go
Comment thread plugins/atr/atr.go
Comment on lines +173 to +174
var out moderationResponse
if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unbounded response body read

json.NewDecoder(resp.Body) reads from the raw body without a size cap. A misconfigured or adversarial ATR service could stream a very large body and cause unbounded memory growth inside the plugin. Wrap the body with io.LimitReader(resp.Body, maxBodyBytes) (e.g. 1 MB) before passing it to the decoder.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/atr/atr.go`:
- Line 83: The moderation call to p.moderate(text) is not propagating the
request context, causing it to be detached from the incoming request lifecycle
and preventing proper cancellation/deadline handling. Update the p.moderate()
method signature to accept a context parameter as the first argument, and pass
the incoming request context (from the handler/method signature) to all calls to
p.moderate(). This fix applies to the call at line 83 as well as the calls at
lines 153-161, ensuring that all moderation operations are properly tied to the
request lifecycle and can be cancelled when the caller times out or cancels.
- Around line 167-179: The moderation check in the p.client.Do(httpReq) response
handling does not validate the HTTP status code before processing the response,
which means non-2xx responses that still decode (or have empty results) are
treated as benign and allow the request through. Add a status code validation
check immediately after resp.Body.Close() to ensure resp.StatusCode indicates
success (2xx range), and return an error if it does not, enforcing fail-closed
semantics so that unhealthy or misconfigured moderation endpoints block requests
rather than allow them.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 845f325e-20df-485a-81af-a0f361f35866

📥 Commits

Reviewing files that changed from the base of the PR and between 1cdd311 and 79df339.

⛔ Files ignored due to path filters (1)
  • plugins/atr/go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • plugins/atr/README.md
  • plugins/atr/atr.go
  • plugins/atr/atr_test.go
  • plugins/atr/go.mod

Comment thread plugins/atr/atr.go Outdated
Comment thread plugins/atr/atr.go
Addresses CodeRabbit review:
- moderate() now uses the request's context (BifrostContext implements
  context.Context) instead of context.Background(), so cancellation/deadline
  propagate to the moderation HTTP call.
- A non-2xx status or a resultless body is returned as an error so the
  fail-closed policy applies instead of silently allowing the request.
- Added TestPreLLMHookNon2xxFailClosed.

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345

Copy link
Copy Markdown
Author

Thanks @coderabbitai — both addressed in the latest commit:

  1. Context propagation: moderate now takes the request context (*BifrostContext implements context.Context) and passes it to http.NewRequestWithContext, so cancellation/deadline propagate. Falls back to context.Background() only when the hook context is nil.
  2. Fail-closed on unhealthy moderation: a non-2xx status or a resultless body now returns an error, so the configured fail_closed policy applies instead of silently allowing. Added TestPreLLMHookNon2xxFailClosed covering both fail-closed (blocks) and fail-open (proceeds).

go build / go test (8 tests) / go vet / gofmt all green.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@eeee2345 Both fixes look solid — thank you for the thorough follow-through!

  • Context propagation: Using http.NewRequestWithContext with the hook's context (falling back to context.Background() only when nil) is exactly the right approach, ensuring cancellations and deadlines flow through to the moderation call.
  • Fail-closed on unhealthy moderation: Treating non-2xx responses and empty result sets as errors and routing them through the fail_closed policy is the correct semantic — and the new TestPreLLMHookNon2xxFailClosed covering both branches ties it up nicely.

The implementation looks clean and well-tested. 🐇✨


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants