feat(plugins): add ATR plugin (Agent Threat Rules guardrail)#4591
feat(plugins): add ATR plugin (Agent Threat Rules guardrail)#4591eeee2345 wants to merge 2 commits into
Conversation
Adds plugins/atr — an LLMPlugin that screens requests against Agent Threat Rules (ATR), an open detection-rule standard for AI-agent / LLM / MCP threats. PreLLMHook flattens the chat prompt, calls an ATR-backed OpenAI-compatible /v1/moderations endpoint, and short-circuits flagged requests with a 403 before the provider call. Keeps the gateway language-agnostic (no Go port of ATR). - Implements schemas.LLMPlugin (compile-time asserted); PluginName/Config/Init follow the built-in plugin convention. - Fail-open by default (configurable fail-closed). - go build + go test (7 tests, httptest mock) + go vet + gofmt all green against github.com/maximhq/bifrost/core v1.5.22. Signed-off-by: Adam Lin <adam@agentthreatrule.org>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughA new ChangesATR Moderation Plugin
Sequence Diagram(s)sequenceDiagram
participant Bifrost
participant ATRPlugin
participant ModerationEndpoint
rect rgba(100, 149, 237, 0.5)
Note over Bifrost,ATRPlugin: PreLLMHook
Bifrost->>ATRPlugin: PreLLMHook(ctx, req)
ATRPlugin->>ATRPlugin: promptText — flatten chat messages
end
alt text is empty
ATRPlugin-->>Bifrost: pass-through (nil, nil, nil)
else text is non-empty
ATRPlugin->>ModerationEndpoint: POST /v1/moderations {"input": text}
alt unreachable + FailClosed=true
ModerationEndpoint--xATRPlugin: error
ATRPlugin-->>Bifrost: ShortCircuit 403 "endpoint unreachable"
else unreachable + FailClosed=false
ModerationEndpoint--xATRPlugin: error
ATRPlugin-->>Bifrost: pass-through (fail-open)
else flagged=true
ModerationEndpoint-->>ATRPlugin: {flagged:true, categories:{...}}
ATRPlugin-->>Bifrost: ShortCircuit 403 "flagged: <categories>"
else flagged=false
ModerationEndpoint-->>ATRPlugin: {flagged:false}
ATRPlugin-->>Bifrost: pass-through (req, nil, nil)
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| var out moderationResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&out); err != nil { |
There was a problem hiding this comment.
json.NewDecoder(resp.Body) reads from the raw body without a size cap. A misconfigured or adversarial ATR service could stream a very large body and cause unbounded memory growth inside the plugin. Wrap the body with io.LimitReader(resp.Body, maxBodyBytes) (e.g. 1 MB) before passing it to the decoder.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/atr/atr.go`:
- Line 83: The moderation call to p.moderate(text) is not propagating the
request context, causing it to be detached from the incoming request lifecycle
and preventing proper cancellation/deadline handling. Update the p.moderate()
method signature to accept a context parameter as the first argument, and pass
the incoming request context (from the handler/method signature) to all calls to
p.moderate(). This fix applies to the call at line 83 as well as the calls at
lines 153-161, ensuring that all moderation operations are properly tied to the
request lifecycle and can be cancelled when the caller times out or cancels.
- Around line 167-179: The moderation check in the p.client.Do(httpReq) response
handling does not validate the HTTP status code before processing the response,
which means non-2xx responses that still decode (or have empty results) are
treated as benign and allow the request through. Add a status code validation
check immediately after resp.Body.Close() to ensure resp.StatusCode indicates
success (2xx range), and return an error if it does not, enforcing fail-closed
semantics so that unhealthy or misconfigured moderation endpoints block requests
rather than allow them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 845f325e-20df-485a-81af-a0f361f35866
⛔ Files ignored due to path filters (1)
plugins/atr/go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
plugins/atr/README.mdplugins/atr/atr.goplugins/atr/atr_test.goplugins/atr/go.mod
Addresses CodeRabbit review: - moderate() now uses the request's context (BifrostContext implements context.Context) instead of context.Background(), so cancellation/deadline propagate to the moderation HTTP call. - A non-2xx status or a resultless body is returned as an error so the fail-closed policy applies instead of silently allowing the request. - Added TestPreLLMHookNon2xxFailClosed. Signed-off-by: Adam Lin <adam@agentthreatrule.org>
|
Thanks @coderabbitai — both addressed in the latest commit:
|
|
Tip For best results, initiate chat on the files or code changes.
The implementation looks clean and well-tested. 🐇✨ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
What
Adds
plugins/atr— anLLMPluginthat screens requests against Agent Threat Rules (ATR), an open MIT-licensed detection-rule standard for AI-agent / LLM / MCP threats (prompt injection, tool poisoning, credential exfiltration, skill supply-chain).PreLLMHookflattens the chat prompt, calls an ATR-backed OpenAI-compatible/v1/moderationsendpoint, and short-circuits flagged requests with a403before the provider call. This keeps the Go gateway language-agnostic — no Go port of the ATR engine; the engine runs behind a moderation endpoint (e.g.pyatr's OpenAI-moderation adapter, or any service returning the OpenAI moderation shape).How
schemas.LLMPlugin— compile-time asserted viavar _ schemas.LLMPlugin = (*Plugin)(nil).PreRequestHook/PostLLMHookare pass-throughs; the gate is inPreLLMHook.PluginName/Config/Init(*Config)follow the built-in plugin convention (telemetry,governance, …), so it's drop-in forInstantiatePlugin.fail_closed: trueblocks when the endpoint is unreachable.Tests
go build,go test ./...(7 tests viahttptest— prompt extraction, block-on-flagged, allow-on-benign, fail-open, fail-closed, config validation),go vet, andgofmtall pass against the publishedgithub.com/maximhq/bifrost/core v1.5.22.Note on wiring
I kept this PR to the self-contained
plugins/atr/module. I deliberately did not add theatrcase totransports/bifrost-http/server/plugins.goyet, because that would make thetransportsmodule require an as-yet-unpublishedplugins/atrversion and break its build. Happy to follow up with the loader registration + a config example once you'd assign it a published version (or point me at the workflow you prefer). DCO signed-off.