Using Guardrails¶

Guardrails in Genie Tooling provide a mechanism to enforce policies and safety checks on inputs, outputs, and tool usage attempts. They are implemented as plugins and integrated into the core Genie facade operations.

Core Concepts¶

GuardrailManager: Orchestrates the execution of different types of guardrail plugins.
Guardrail Plugin Types:
- InputGuardrailPlugin: Checks data provided to the system or an LLM (e.g., user prompts, chat messages).
- OutputGuardrailPlugin: Checks data produced by the system or an LLM (e.g., LLM responses, tool execution results before final output).
- ToolUsageGuardrailPlugin: Checks if a specific tool usage attempt (tool + parameters) is permissible before execution.

GuardrailViolation (TypedDict): The result of a guardrail check:

from typing import Literal, Optional, Dict, Any, TypedDict

class GuardrailViolation(TypedDict):
    action: Literal["allow", "block", "warn"]
    reason: Optional[str]
    guardrail_id: Optional[str]
    details: Optional[Dict[str, Any]]

allow: The check passed.
block: The operation should be prevented.
warn: The operation can proceed, but a warning should be logged or noted.

Configuration¶

Guardrails are configured in MiddlewareConfig, primarily through FeatureSettings for enabling lists of guardrails, and guardrail_configurations for specific plugin settings.

Via FeatureSettings:

from genie_tooling.config.models import MiddlewareConfig
from genie_tooling.config.features import FeatureSettings

app_config = MiddlewareConfig(
    features=FeatureSettings(
        # ... other features ...
        input_guardrails=["keyword_blocklist_guardrail"], # Enable by alias
        output_guardrails=["keyword_blocklist_guardrail"],
        # tool_usage_guardrails=["my_custom_tool_usage_policy_v1"] # Example
    ),
    guardrail_configurations={
        "keyword_blocklist_guardrail_v1": { # Canonical ID
            "blocklist": ["unsafe_topic", "banned_phrase"],
            "case_sensitive": False,
            "action_on_match": "block" # or "warn"
        },
        # "my_custom_tool_usage_policy_v1": { ... }
    }
)

features.input_guardrails, features.output_guardrails, features.tool_usage_guardrails: Lists of plugin IDs or aliases for guardrails to activate for each category.
guardrail_configurations: A dictionary where keys are canonical guardrail plugin IDs (or aliases) and values are their specific configuration dictionaries.

Implicit Integration¶

Guardrails are automatically invoked by relevant Genie facade methods:

Input Guardrails:
- Checked by genie.llm.chat() and genie.llm.generate() before sending data to the LLM.
- Checked by genie.run_command() on the user's command string before processing.
Output Guardrails:
- Checked by genie.llm.chat() and genie.llm.generate() on the LLM's response before returning it.
- Checked by genie.execute_tool() (via the invocation strategy) on the raw tool result before transformation.
Tool Usage Guardrails:
- Checked by genie.execute_tool() (via the invocation strategy) before the tool's execute() method is called.
- Checked by genie.run_command() after a tool and its parameters have been determined by the command processor, but before execution and before HITL (if HITL is also active).

Behavior on "block": * If an input guardrail blocks, the operation (e.g., LLM call) is prevented, and a PermissionError is typically raised by the Genie facade method. * If an output guardrail blocks, the original output is replaced with a message indicating it was blocked (e.g., "[RESPONSE BLOCKED: Reason]"). * If a tool usage guardrail blocks, the tool execution is prevented, and an error is typically returned by genie.run_command() or genie.execute_tool().

Built-in Guardrails¶

KeywordBlocklistGuardrailPlugin (alias: keyword_blocklist_guardrail):
- Implements InputGuardrailPlugin and OutputGuardrailPlugin.
- Checks text data against a configurable list of keywords.
- Configuration:
  - blocklist (List[str]): Keywords to block/warn on.
  - case_sensitive (bool, default: False): Whether matching is case-sensitive.
  - action_on_match (Literal["block", "warn"], default: "block"): Action to take if a keyword is found.

Creating Custom Guardrail Plugins¶

Choose the Base Protocol:
- InputGuardrailPlugin: Implement async def check_input(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation.
- OutputGuardrailPlugin: Implement async def check_output(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation.
- ToolUsageGuardrailPlugin: Implement async def check_tool_usage(self, tool: Tool, params: Dict[str, Any], context: Optional[Dict[str, Any]]) -> GuardrailViolation. A single plugin class can implement multiple of these protocols if it's designed to check different types of data.
Implement the Check Logic: Your method should analyze the data (and context or tool/params) and return a GuardrailViolation dictionary.
Register Your Plugin: Use entry points in pyproject.toml or place it in a directory specified by plugin_dev_dirs in MiddlewareConfig.
Configure It: Add its ID to the appropriate list in features (e.g., features.input_guardrails) and provide any necessary configuration in guardrail_configurations.