Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ This project is a Ruby SDK for building multi-agent AI workflows. It allows deve
- `lib/agents/tool.rb`: Defines the `Tool` class, the base for creating custom tools for agents.
- `lib/agents/agent_runner.rb`: Thread-safe agent execution manager for multi-agent conversations.
- `lib/agents/runner.rb`: Internal orchestrator that handles individual conversation turns.
- `lib/agents/guard.rb`: Base class for guardrails — stateless input/output validators.
- `lib/agents/guard_result.rb`: Value object for guard outcomes (pass/rewrite/tripwire).
- `lib/agents/guard_runner.rb`: Ordered chain executor for guards with fail-open/closed modes.
- `spec/`: Contains the RSpec tests for the project.
- `examples/`: Includes example implementations of multi-agent systems, such as an ISP customer support demo.
- `Gemfile`: Manages the project's Ruby dependencies.
Expand Down Expand Up @@ -65,7 +68,9 @@ This will start a command-line interface where you can interact with the multi-a
- **Handoff**: The process of transferring a conversation from one agent to another. This is a core feature of the SDK.
- **Runner**: Internal component that manages individual conversation turns (used by AgentRunner).
- **Context**: A shared state object that stores conversation history and agent information, fully serializable for persistence.
- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, and handoffs.
- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, handoffs, and guard triggers.
- **Guard**: A stateless validator that intercepts content before (input) or after (output) agent execution. Returns pass, rewrite (modify content), or tripwire (abort run).
- **GuardRunner**: Executes an ordered chain of guards. Supports fail-open (default) and fail-closed (strict) error handling.

## Development Commands

Expand Down Expand Up @@ -118,6 +123,9 @@ ruby examples/isp-support/interactive.rb
- **Agents::Context**: Shared state management across agent interactions
- **Agents::Handoff**: Manages seamless transfers between agents
- **Agents::CallbackManager**: Centralized event handling for real-time monitoring
- **Agents::Guard**: Base class for guardrails (input/output content validation)
- **Agents::GuardResult**: Value object for guard outcomes (pass/rewrite/tripwire)
- **Agents::GuardRunner**: Ordered guard chain executor with fail-open/closed modes

### Key Design Principles

Expand All @@ -143,6 +151,9 @@ lib/agents/
├── tool_context.rb # Tool execution context
├── tool_wrapper.rb # Thread-safe tool wrapping
├── callback_manager.rb # Centralized callback event handling
├── guard.rb # Base class for guardrails (input/output validators)
├── guard_result.rb # Value object for guard outcomes (pass/rewrite/tripwire)
├── guard_runner.rb # Ordered guard chain executor
├── message_extractor.rb # Conversation history processing
└── version.rb # Gem version
```
Expand Down Expand Up @@ -231,6 +242,7 @@ The SDK includes a comprehensive callback system for monitoring agent execution
- `on_tool_start`: Triggered when a tool begins execution
- `on_tool_complete`: Triggered when a tool finishes execution
- `on_agent_handoff`: Triggered when control transfers between agents
- `on_guard_triggered`: Triggered when a guard produces a non-pass result (rewrite or tripwire)

### Callback Integration

Expand Down
205 changes: 205 additions & 0 deletions docs/concepts/guardrails.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
---
layout: default
title: Guardrails
parent: Concepts
nav_order: 8
---

# Guardrails

Guardrails are composable validation layers that intercept content before it reaches an agent (input guards) and before it returns to the caller (output guards). They allow you to enforce policies, redact sensitive data, and abort runs when content violates your rules.

## How Guards Work

A guard is a stateless class that receives content and returns one of three outcomes:

- **Pass** (return `nil` or `GuardResult.pass`): Content is acceptable, continue execution.
- **Rewrite** (`GuardResult.rewrite`): Replace the content with a modified version.
- **Tripwire** (`GuardResult.tripwire`): Abort the run immediately with an error.

```ruby
class PiiRedactor < Agents::Guard
guard_name "pii_redactor"
description "Redacts Social Security numbers from content"

def call(content, context)
redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]")
GuardResult.rewrite(redacted, message: "SSN redacted") if redacted != content
end
end
```

## Input Guards vs Output Guards

**Input guards** run before the first LLM call. They validate or transform the user's message before the agent sees it. Use them for prompt injection detection, input sanitization, or content filtering.

**Output guards** run on the agent's final response before it returns to the caller. They validate or transform what the agent says back. Use them for PII redaction, topic fencing, or response quality checks.

```ruby
agent = Agents::Agent.new(
name: "Support",
instructions: "You are a helpful support agent.",
input_guards: [PromptInjectionGuard.new],
output_guards: [PiiRedactor.new, TopicFence.new]
)
```

Guards execute in array order. Each guard sees the output of the previous guard's potential rewrite, forming a processing pipeline.

## Writing a Guard

Extend `Agents::Guard` and implement the `call` method:

```ruby
class MaxLengthGuard < Agents::Guard
guard_name "max_length"
description "Tripwires if content exceeds maximum length"

def initialize(max:)
super()
@max = max
end

def call(content, context)
if content.length > @max
GuardResult.tripwire(
message: "Content exceeds #{@max} characters",
metadata: { length: content.length, max: @max }
)
end
end
end
```

Guards follow the same thread-safety principles as Tools:
- No execution state in instance variables (only configuration like `@max` above)
- All shared state flows through the `context` parameter
- Guard instances are immutable after creation

## Tripwires

When a guard tripwires, the run aborts immediately. The result includes structured metadata about what happened:

```ruby
result = runner.run("Tell me a secret")

if result.tripwired?
puts result.guardrail_tripwire[:guard_name] # => "content_policy"
puts result.guardrail_tripwire[:message] # => "Response violates content policy"
puts result.guardrail_tripwire[:metadata] # => { category: "secrets" }
end
```

Tripwires short-circuit the guard chain. If guard 1 tripwires, guards 2 and 3 never run.

## Fail-Open vs Fail-Closed

By default, guards are **fail-open**: if a guard raises an unexpected exception (not a Tripwire), the error is logged and the guard is skipped. This prevents a buggy guard from breaking your entire application.

For high-security contexts, you can configure **fail-closed** (strict) mode on the agent. In strict mode, any unexpected guard exception is converted to a tripwire:

```ruby
# Fail-open (default) — buggy guard is skipped, run continues
agent = Agents::Agent.new(
name: "Support",
input_guards: [PotentiallyBuggyGuard.new]
)

# Fail-closed — any guard error aborts the run
# (configured via GuardRunner strict: true, typically set at the runner level)
```

## Structured Output

When an agent uses `response_schema`, the LLM returns structured data (a Hash). Output guards still receive a String — the SDK automatically serializes the Hash to JSON before the guard chain and deserializes it back after any rewrite. This means your guards always operate on Strings regardless of output format.

```ruby
# This guard works on both plain text and structured output
class ContentFilter < Agents::Guard
guard_name "content_filter"

def call(content, context)
# content is always a String — JSON for structured output
if content.include?("forbidden")
GuardResult.tripwire(message: "Forbidden content detected")
end
end
end
```

## Guards Across Handoffs

Guards are agent-scoped. When agent A hands off to agent B:

- Agent A's **input guards** ran once on the original user input (before the handoff decision).
- Agent A's **output guards** do NOT run — the handoff interrupts before a final response.
- Agent B's **output guards** run on agent B's final response.

This means each agent enforces its own policies independently.

## Callbacks and Instrumentation

Guard activity is observable through the callback system:

```ruby
runner = Agents::Runner.with_agents(agent)
.on_guard_triggered { |guard_name, phase, action, message, ctx|
puts "Guard #{guard_name} (#{phase}): #{action} — #{message}"
}
```

The callback fires for every non-pass result (rewrites and tripwires). It does not fire when guards pass.

If OpenTelemetry instrumentation is installed, guard events produce `agents.run.guard.*` spans with attributes for guard name, phase (input/output), action (rewrite/tripwire), and message.

## Complete Example

```ruby
class PromptInjectionGuard < Agents::Guard
guard_name "prompt_injection"
description "Detects common prompt injection patterns"

def call(content, context)
patterns = [
/ignore\s+(all\s+)?previous\s+instructions/i,
/you\s+are\s+now\s+a/i,
/disregard\s+(all\s+)?prior/i
]

if patterns.any? { |p| content.match?(p) }
GuardResult.tripwire(
message: "Potential prompt injection detected",
metadata: { input_length: content.length }
)
end
end
end

class PiiRedactor < Agents::Guard
guard_name "pii_redactor"
description "Redacts SSNs and email addresses"

def call(content, context)
redacted = content
.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[SSN REDACTED]")
.gsub(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/, "[EMAIL REDACTED]")

GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content
end
end

agent = Agents::Agent.new(
name: "Support",
instructions: "You are a helpful customer support agent.",
input_guards: [PromptInjectionGuard.new],
output_guards: [PiiRedactor.new]
)

runner = Agents::Runner.with_agents(agent)
.on_guard_triggered { |name, phase, action, msg|
Rails.logger.info("Guard #{name} (#{phase}): #{action}")
}

result = runner.run("What is my email?")
# Output PII is automatically redacted before reaching the user
```
3 changes: 3 additions & 0 deletions lib/agents.rb
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ def configured?
require_relative "agents/tool"
require_relative "agents/handoff"
require_relative "agents/helpers"
require_relative "agents/guard_result"
require_relative "agents/guard"
require_relative "agents/guard_runner"
require_relative "agents/agent"

# Execution components
Expand Down
11 changes: 8 additions & 3 deletions lib/agents/agent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@
# )
module Agents
class Agent
attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params
attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params,
:input_guards, :output_guards

# Initialize a new Agent instance
#
Expand All @@ -64,7 +65,7 @@ class Agent
# @param headers [Hash, nil] Default HTTP headers applied to LLM requests
# @param params [Hash, nil] Default provider-specific parameters applied to LLM requests (e.g., service_tier)
def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], handoff_agents: [], temperature: 0.7,
response_schema: nil, headers: nil, params: nil)
response_schema: nil, headers: nil, params: nil, input_guards: [], output_guards: [])
@name = name
@instructions = instructions
@model = model
Expand All @@ -74,6 +75,8 @@ def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], hando
@response_schema = response_schema
@headers = Helpers::HashNormalizer.normalize(headers, label: "headers", freeze_result: true)
@params = Helpers::HashNormalizer.normalize(params, label: "params", freeze_result: true)
@input_guards = input_guards.dup.freeze
@output_guards = output_guards.dup.freeze

# Mutex for thread-safe handoff registration
# While agents are typically configured at startup, we want to ensure
Expand Down Expand Up @@ -170,7 +173,9 @@ def clone(**changes)
temperature: changes.fetch(:temperature, @temperature),
response_schema: changes.fetch(:response_schema, @response_schema),
headers: changes.fetch(:headers, @headers),
params: changes.fetch(:params, @params)
params: changes.fetch(:params, @params),
input_guards: changes.fetch(:input_guards, @input_guards),
output_guards: changes.fetch(:output_guards, @output_guards)
)
end

Expand Down
15 changes: 14 additions & 1 deletion lib/agents/agent_runner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ def initialize(agents)
agent_thinking: [],
agent_handoff: [],
llm_call_complete: [],
chat_created: []
chat_created: [],
guard_triggered: []
}
end

Expand Down Expand Up @@ -195,6 +196,18 @@ def on_chat_created(&block)
self
end

# Register a callback for guard triggered events.
# Called when a guardrail produces a non-pass result (rewrite or tripwire).
#
# @param block [Proc] Callback block that receives (guard_name, phase, action, message, context_wrapper)
# @return [self] For method chaining
def on_guard_triggered(&block)
return self unless block

@callbacks_mutex.synchronize { @callbacks[:guard_triggered] << block }
self
end

private

# Build agent registry from provided agents only.
Expand Down
1 change: 1 addition & 0 deletions lib/agents/callback_manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class CallbackManager
agent_handoff
llm_call_complete
chat_created
guard_triggered
].freeze

def initialize(callbacks = {})
Expand Down
Loading
Loading