chatwoot · sergiobayona · Mar 17, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -30,6 +30,9 @@ This project is a Ruby SDK for building multi-agent AI workflows. It allows deve
     -   `lib/agents/tool.rb`: Defines the `Tool` class, the base for creating custom tools for agents.
     -   `lib/agents/agent_runner.rb`: Thread-safe agent execution manager for multi-agent conversations.
     -   `lib/agents/runner.rb`: Internal orchestrator that handles individual conversation turns.
+    -   `lib/agents/guard.rb`: Base class for guardrails — stateless input/output validators.
+    -   `lib/agents/guard_result.rb`: Value object for guard outcomes (pass/rewrite/tripwire).
+    -   `lib/agents/guard_runner.rb`: Ordered chain executor for guards with fail-open/closed modes.
 -   `spec/`: Contains the RSpec tests for the project.
 -   `examples/`: Includes example implementations of multi-agent systems, such as an ISP customer support demo.
 -   `Gemfile`: Manages the project's Ruby dependencies.
@@ -65,7 +68,9 @@ This will start a command-line interface where you can interact with the multi-a
 -   **Handoff**: The process of transferring a conversation from one agent to another. This is a core feature of the SDK.
 -   **Runner**: Internal component that manages individual conversation turns (used by AgentRunner).
 -   **Context**: A shared state object that stores conversation history and agent information, fully serializable for persistence.
--   **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, and handoffs.
+-   **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, handoffs, and guard triggers.
+-   **Guard**: A stateless validator that intercepts content before (input) or after (output) agent execution. Returns pass, rewrite (modify content), or tripwire (abort run).
+-   **GuardRunner**: Executes an ordered chain of guards. Supports fail-open (default) and fail-closed (strict) error handling.
 
 ## Development Commands
 
@@ -118,6 +123,9 @@ ruby examples/isp-support/interactive.rb
 - **Agents::Context**: Shared state management across agent interactions
 - **Agents::Handoff**: Manages seamless transfers between agents
 - **Agents::CallbackManager**: Centralized event handling for real-time monitoring
+- **Agents::Guard**: Base class for guardrails (input/output content validation)
+- **Agents::GuardResult**: Value object for guard outcomes (pass/rewrite/tripwire)
+- **Agents::GuardRunner**: Ordered guard chain executor with fail-open/closed modes
 
 ### Key Design Principles
 
@@ -143,6 +151,9 @@ lib/agents/
 ├── tool_context.rb     # Tool execution context
 ├── tool_wrapper.rb     # Thread-safe tool wrapping
 ├── callback_manager.rb # Centralized callback event handling
+├── guard.rb            # Base class for guardrails (input/output validators)
+├── guard_result.rb     # Value object for guard outcomes (pass/rewrite/tripwire)
+├── guard_runner.rb     # Ordered guard chain executor
 ├── message_extractor.rb # Conversation history processing
 └── version.rb          # Gem version
 ```
@@ -231,6 +242,7 @@ The SDK includes a comprehensive callback system for monitoring agent execution
 - `on_tool_start`: Triggered when a tool begins execution
 - `on_tool_complete`: Triggered when a tool finishes execution
 - `on_agent_handoff`: Triggered when control transfers between agents
+- `on_guard_triggered`: Triggered when a guard produces a non-pass result (rewrite or tripwire)
 
 ### Callback Integration
 

diff --git a/docs/concepts/guardrails.md b/docs/concepts/guardrails.md
@@ -0,0 +1,205 @@
+---
+layout: default
+title: Guardrails
+parent: Concepts
+nav_order: 8
+---
+
+# Guardrails
+
+Guardrails are composable validation layers that intercept content before it reaches an agent (input guards) and before it returns to the caller (output guards). They allow you to enforce policies, redact sensitive data, and abort runs when content violates your rules.
+
+## How Guards Work
+
+A guard is a stateless class that receives content and returns one of three outcomes:
+
+- **Pass** (return `nil` or `GuardResult.pass`): Content is acceptable, continue execution.
+- **Rewrite** (`GuardResult.rewrite`): Replace the content with a modified version.
+- **Tripwire** (`GuardResult.tripwire`): Abort the run immediately with an error.
+
+```ruby
+class PiiRedactor < Agents::Guard
+  guard_name "pii_redactor"
+  description "Redacts Social Security numbers from content"
+
+  def call(content, context)
+    redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]")
+    GuardResult.rewrite(redacted, message: "SSN redacted") if redacted != content
+  end
+end
+```
+
+## Input Guards vs Output Guards
+
+**Input guards** run before the first LLM call. They validate or transform the user's message before the agent sees it. Use them for prompt injection detection, input sanitization, or content filtering.
+
+**Output guards** run on the agent's final response before it returns to the caller. They validate or transform what the agent says back. Use them for PII redaction, topic fencing, or response quality checks.
+
+```ruby
+agent = Agents::Agent.new(
+  name: "Support",
+  instructions: "You are a helpful support agent.",
+  input_guards: [PromptInjectionGuard.new],
+  output_guards: [PiiRedactor.new, TopicFence.new]
+)
+```
+
+Guards execute in array order. Each guard sees the output of the previous guard's potential rewrite, forming a processing pipeline.
+
+## Writing a Guard
+
+Extend `Agents::Guard` and implement the `call` method:
+
+```ruby
+class MaxLengthGuard < Agents::Guard
+  guard_name "max_length"
+  description "Tripwires if content exceeds maximum length"
+
+  def initialize(max:)
+    super()
+    @max = max
+  end
+
+  def call(content, context)
+    if content.length > @max
+      GuardResult.tripwire(
+        message: "Content exceeds #{@max} characters",
+        metadata: { length: content.length, max: @max }
+      )
+    end
+  end
+end
+```
+
+Guards follow the same thread-safety principles as Tools:
+- No execution state in instance variables (only configuration like `@max` above)
+- All shared state flows through the `context` parameter
+- Guard instances are immutable after creation
+
+## Tripwires
+
+When a guard tripwires, the run aborts immediately. The result includes structured metadata about what happened:
+
+```ruby
+result = runner.run("Tell me a secret")
+
+if result.tripwired?
+  puts result.guardrail_tripwire[:guard_name]  # => "content_policy"
+  puts result.guardrail_tripwire[:message]     # => "Response violates content policy"
+  puts result.guardrail_tripwire[:metadata]    # => { category: "secrets" }
+end
+```
+
+Tripwires short-circuit the guard chain. If guard 1 tripwires, guards 2 and 3 never run.
+
+## Fail-Open vs Fail-Closed
+
+By default, guards are **fail-open**: if a guard raises an unexpected exception (not a Tripwire), the error is logged and the guard is skipped. This prevents a buggy guard from breaking your entire application.
+
+For high-security contexts, you can configure **fail-closed** (strict) mode on the agent. In strict mode, any unexpected guard exception is converted to a tripwire:
+
+```ruby
+# Fail-open (default) — buggy guard is skipped, run continues
+agent = Agents::Agent.new(
+  name: "Support",
+  input_guards: [PotentiallyBuggyGuard.new]
+)
+
+# Fail-closed — any guard error aborts the run
+# (configured via GuardRunner strict: true, typically set at the runner level)
+```
+
+## Structured Output
+
+When an agent uses `response_schema`, the LLM returns structured data (a Hash). Output guards still receive a String — the SDK automatically serializes the Hash to JSON before the guard chain and deserializes it back after any rewrite. This means your guards always operate on Strings regardless of output format.
+
+```ruby
+# This guard works on both plain text and structured output
+class ContentFilter < Agents::Guard
+  guard_name "content_filter"
+
+  def call(content, context)
+    # content is always a String — JSON for structured output
+    if content.include?("forbidden")
+      GuardResult.tripwire(message: "Forbidden content detected")
+    end
+  end
+end
+```
+
+## Guards Across Handoffs
+
+Guards are agent-scoped. When agent A hands off to agent B:
+
+- Agent A's **input guards** ran once on the original user input (before the handoff decision).
+- Agent A's **output guards** do NOT run — the handoff interrupts before a final response.
+- Agent B's **output guards** run on agent B's final response.
+
+This means each agent enforces its own policies independently.
+
+## Callbacks and Instrumentation
+
+Guard activity is observable through the callback system:
+
+```ruby
+runner = Agents::Runner.with_agents(agent)
+  .on_guard_triggered { |guard_name, phase, action, message, ctx|
+    puts "Guard #{guard_name} (#{phase}): #{action} — #{message}"
+  }
+```
+
+The callback fires for every non-pass result (rewrites and tripwires). It does not fire when guards pass.
+
+If OpenTelemetry instrumentation is installed, guard events produce `agents.run.guard.*` spans with attributes for guard name, phase (input/output), action (rewrite/tripwire), and message.
+
+## Complete Example
+
+```ruby
+class PromptInjectionGuard < Agents::Guard
+  guard_name "prompt_injection"
+  description "Detects common prompt injection patterns"
+
+  def call(content, context)
+    patterns = [
+      /ignore\s+(all\s+)?previous\s+instructions/i,
+      /you\s+are\s+now\s+a/i,
+      /disregard\s+(all\s+)?prior/i
+    ]
+
+    if patterns.any? { |p| content.match?(p) }
+      GuardResult.tripwire(
+        message: "Potential prompt injection detected",
+        metadata: { input_length: content.length }
+      )
+    end
+  end
+end
+
+class PiiRedactor < Agents::Guard
+  guard_name "pii_redactor"
+  description "Redacts SSNs and email addresses"
+
+  def call(content, context)
+    redacted = content
+      .gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[SSN REDACTED]")
+      .gsub(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/, "[EMAIL REDACTED]")
+
+    GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content
+  end
+end
+
+agent = Agents::Agent.new(
+  name: "Support",
+  instructions: "You are a helpful customer support agent.",
+  input_guards: [PromptInjectionGuard.new],
+  output_guards: [PiiRedactor.new]
+)
+
+runner = Agents::Runner.with_agents(agent)
+  .on_guard_triggered { |name, phase, action, msg|
+    Rails.logger.info("Guard #{name} (#{phase}): #{action}")
+  }
+
+result = runner.run("What is my email?")
+# Output PII is automatically redacted before reaching the user
+```
diff --git a/lib/agents.rb b/lib/agents.rb
@@ -112,6 +112,9 @@ def configured?
 require_relative "agents/tool"
 require_relative "agents/handoff"
 require_relative "agents/helpers"
+require_relative "agents/guard_result"
+require_relative "agents/guard"
+require_relative "agents/guard_runner"
 require_relative "agents/agent"
 
 # Execution components

diff --git a/lib/agents/agent.rb b/lib/agents/agent.rb
@@ -50,7 +50,8 @@
 #   )
 module Agents
   class Agent
-    attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params
+    attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params,
+                :input_guards, :output_guards
 
     # Initialize a new Agent instance
     #
@@ -64,7 +65,7 @@ class Agent
     # @param headers [Hash, nil] Default HTTP headers applied to LLM requests
     # @param params [Hash, nil] Default provider-specific parameters applied to LLM requests (e.g., service_tier)
     def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], handoff_agents: [], temperature: 0.7,
-                   response_schema: nil, headers: nil, params: nil)
+                   response_schema: nil, headers: nil, params: nil, input_guards: [], output_guards: [])
       @name = name
       @instructions = instructions
       @model = model
@@ -74,6 +75,8 @@ def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], hando
       @response_schema = response_schema
       @headers = Helpers::HashNormalizer.normalize(headers, label: "headers", freeze_result: true)
       @params = Helpers::HashNormalizer.normalize(params, label: "params", freeze_result: true)
+      @input_guards = input_guards.dup.freeze
+      @output_guards = output_guards.dup.freeze
 
       # Mutex for thread-safe handoff registration
       # While agents are typically configured at startup, we want to ensure
@@ -170,7 +173,9 @@ def clone(**changes)
         temperature: changes.fetch(:temperature, @temperature),
         response_schema: changes.fetch(:response_schema, @response_schema),
         headers: changes.fetch(:headers, @headers),
-        params: changes.fetch(:params, @params)
+        params: changes.fetch(:params, @params),
+        input_guards: changes.fetch(:input_guards, @input_guards),
+        output_guards: changes.fetch(:output_guards, @output_guards)
       )
     end
 

diff --git a/lib/agents/agent_runner.rb b/lib/agents/agent_runner.rb
@@ -54,7 +54,8 @@ def initialize(agents)
         agent_thinking: [],
         agent_handoff: [],
         llm_call_complete: [],
-        chat_created: []
+        chat_created: [],
+        guard_triggered: []
       }
     end
 
@@ -195,6 +196,18 @@ def on_chat_created(&block)
       self
     end
 
+    # Register a callback for guard triggered events.
+    # Called when a guardrail produces a non-pass result (rewrite or tripwire).
+    #
+    # @param block [Proc] Callback block that receives (guard_name, phase, action, message, context_wrapper)
+    # @return [self] For method chaining
+    def on_guard_triggered(&block)
+      return self unless block
+
+      @callbacks_mutex.synchronize { @callbacks[:guard_triggered] << block }
+      self
+    end
+
     private
 
     # Build agent registry from provided agents only.

diff --git a/lib/agents/callback_manager.rb b/lib/agents/callback_manager.rb
@@ -22,6 +22,7 @@ class CallbackManager
       agent_handoff
       llm_call_complete
       chat_created
+      guard_triggered
     ].freeze
 
     def initialize(callbacks = {})