Using Guardrails¶
Guardrails in Genie Tooling provide a mechanism to enforce policies and safety checks on inputs, outputs, and tool usage attempts. They are implemented as plugins and integrated into the core Genie facade operations.
Core Concepts¶
GuardrailManager: Orchestrates the execution of different types of guardrail plugins.- Guardrail Plugin Types:
InputGuardrailPlugin: Checks data provided to the system or an LLM (e.g., user prompts, chat messages).OutputGuardrailPlugin: Checks data produced by the system or an LLM (e.g., LLM responses, tool execution results before final output).ToolUsageGuardrailPlugin: Checks if a specific tool usage attempt (tool + parameters) is permissible before execution.
GuardrailViolation(TypedDict): The result of a guardrail check:from typing import Literal, Optional, Dict, Any, TypedDict class GuardrailViolation(TypedDict): action: Literal["allow", "block", "warn"] reason: Optional[str] guardrail_id: Optional[str] details: Optional[Dict[str, Any]]allow: The check passed.block: The operation should be prevented.warn: The operation can proceed, but a warning should be logged or noted.
Configuration¶
Guardrails are configured in MiddlewareConfig, primarily through FeatureSettings for enabling lists of guardrails, and guardrail_configurations for specific plugin settings.
Via FeatureSettings:
from genie_tooling.config.models import MiddlewareConfig
from genie_tooling.config.features import FeatureSettings
app_config = MiddlewareConfig(
features=FeatureSettings(
# ... other features ...
input_guardrails=["keyword_blocklist_guardrail"], # Enable by alias
output_guardrails=["keyword_blocklist_guardrail"],
# tool_usage_guardrails=["my_custom_tool_usage_policy_v1"] # Example
),
guardrail_configurations={
"keyword_blocklist_guardrail_v1": { # Canonical ID
"blocklist": ["unsafe_topic", "banned_phrase"],
"case_sensitive": False,
"action_on_match": "block" # or "warn"
},
# "my_custom_tool_usage_policy_v1": { ... }
}
)
features.input_guardrails,features.output_guardrails,features.tool_usage_guardrails: Lists of plugin IDs or aliases for guardrails to activate for each category.guardrail_configurations: A dictionary where keys are canonical guardrail plugin IDs (or aliases) and values are their specific configuration dictionaries.
Implicit Integration¶
Guardrails are automatically invoked by relevant Genie facade methods:
- Input Guardrails:
- Checked by
genie.llm.chat()andgenie.llm.generate()before sending data to the LLM. - Checked by
genie.run_command()on the user's command string before processing.
- Checked by
- Output Guardrails:
- Checked by
genie.llm.chat()andgenie.llm.generate()on the LLM's response before returning it. - Checked by
genie.execute_tool()(via the invocation strategy) on the raw tool result before transformation.
- Checked by
- Tool Usage Guardrails:
- Checked by
genie.execute_tool()(via the invocation strategy) before the tool'sexecute()method is called. - Checked by
genie.run_command()after a tool and its parameters have been determined by the command processor, but before execution and before HITL (if HITL is also active).
- Checked by
Behavior on "block":
* If an input guardrail blocks, the operation (e.g., LLM call) is prevented, and a PermissionError is typically raised by the Genie facade method.
* If an output guardrail blocks, the original output is replaced with a message indicating it was blocked (e.g., "[RESPONSE BLOCKED: Reason]").
* If a tool usage guardrail blocks, the tool execution is prevented, and an error is typically returned by genie.run_command() or genie.execute_tool().
Built-in Guardrails¶
KeywordBlocklistGuardrailPlugin(alias:keyword_blocklist_guardrail):- Implements
InputGuardrailPluginandOutputGuardrailPlugin. - Checks text data against a configurable list of keywords.
- Configuration:
blocklist(List[str]): Keywords to block/warn on.case_sensitive(bool, default:False): Whether matching is case-sensitive.action_on_match(Literal["block", "warn"], default:"block"): Action to take if a keyword is found.
- Implements
Creating Custom Guardrail Plugins¶
-
Choose the Base Protocol:
InputGuardrailPlugin: Implementasync def check_input(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation.OutputGuardrailPlugin: Implementasync def check_output(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation.ToolUsageGuardrailPlugin: Implementasync def check_tool_usage(self, tool: Tool, params: Dict[str, Any], context: Optional[Dict[str, Any]]) -> GuardrailViolation. A single plugin class can implement multiple of these protocols if it's designed to check different types of data.
-
Implement the Check Logic: Your method should analyze the
data(andcontextortool/params) and return aGuardrailViolationdictionary. -
Register Your Plugin: Use entry points in
pyproject.tomlor place it in a directory specified byplugin_dev_dirsinMiddlewareConfig. -
Configure It: Add its ID to the appropriate list in
features(e.g.,features.input_guardrails) and provide any necessary configuration inguardrail_configurations.