Using Guardrails¶
Guardrails in Genie Tooling provide a mechanism to enforce policies and safety checks on inputs, outputs, and tool usage attempts. They are implemented as plugins and integrated into the core Genie
facade operations.
Core Concepts¶
GuardrailManager
: Orchestrates the execution of different types of guardrail plugins.- Guardrail Plugin Types:
InputGuardrailPlugin
: Checks data provided to the system or an LLM (e.g., user prompts, chat messages).OutputGuardrailPlugin
: Checks data produced by the system or an LLM (e.g., LLM responses, tool execution results before final output).ToolUsageGuardrailPlugin
: Checks if a specific tool usage attempt (tool + parameters) is permissible before execution.
GuardrailViolation
(TypedDict): The result of a guardrail check:from typing import Literal, Optional, Dict, Any, TypedDict class GuardrailViolation(TypedDict): action: Literal["allow", "block", "warn"] reason: Optional[str] guardrail_id: Optional[str] details: Optional[Dict[str, Any]]
allow
: The check passed.block
: The operation should be prevented.warn
: The operation can proceed, but a warning should be logged or noted.
Configuration¶
Guardrails are configured in MiddlewareConfig
, primarily through FeatureSettings
for enabling lists of guardrails, and guardrail_configurations
for specific plugin settings.
Via FeatureSettings
:
from genie_tooling.config.models import MiddlewareConfig
from genie_tooling.config.features import FeatureSettings
app_config = MiddlewareConfig(
features=FeatureSettings(
# ... other features ...
input_guardrails=["keyword_blocklist_guardrail"], # Enable by alias
output_guardrails=["keyword_blocklist_guardrail"],
# tool_usage_guardrails=["my_custom_tool_usage_policy_v1"] # Example
),
guardrail_configurations={
"keyword_blocklist_guardrail_v1": { # Canonical ID
"blocklist": ["unsafe_topic", "banned_phrase"],
"case_sensitive": False,
"action_on_match": "block" # or "warn"
},
# "my_custom_tool_usage_policy_v1": { ... }
}
)
features.input_guardrails
,features.output_guardrails
,features.tool_usage_guardrails
: Lists of plugin IDs or aliases for guardrails to activate for each category.guardrail_configurations
: A dictionary where keys are canonical guardrail plugin IDs (or aliases) and values are their specific configuration dictionaries.
Implicit Integration¶
Guardrails are automatically invoked by relevant Genie
facade methods:
- Input Guardrails:
- Checked by
genie.llm.chat()
andgenie.llm.generate()
before sending data to the LLM. - Checked by
genie.run_command()
on the user's command string before processing.
- Checked by
- Output Guardrails:
- Checked by
genie.llm.chat()
andgenie.llm.generate()
on the LLM's response before returning it. - Checked by
genie.execute_tool()
(via the invocation strategy) on the raw tool result before transformation.
- Checked by
- Tool Usage Guardrails:
- Checked by
genie.execute_tool()
(via the invocation strategy) before the tool'sexecute()
method is called. - Checked by
genie.run_command()
after a tool and its parameters have been determined by the command processor, but before execution and before HITL (if HITL is also active).
- Checked by
Behavior on "block":
* If an input guardrail blocks, the operation (e.g., LLM call) is prevented, and a PermissionError
is typically raised by the Genie
facade method.
* If an output guardrail blocks, the original output is replaced with a message indicating it was blocked (e.g., "[RESPONSE BLOCKED: Reason]"
).
* If a tool usage guardrail blocks, the tool execution is prevented, and an error is typically returned by genie.run_command()
or genie.execute_tool()
.
Built-in Guardrails¶
KeywordBlocklistGuardrailPlugin
(alias:keyword_blocklist_guardrail
):- Implements
InputGuardrailPlugin
andOutputGuardrailPlugin
. - Checks text data against a configurable list of keywords.
- Configuration:
blocklist
(List[str]): Keywords to block/warn on.case_sensitive
(bool, default:False
): Whether matching is case-sensitive.action_on_match
(Literal["block", "warn"], default:"block"
): Action to take if a keyword is found.
- Implements
Creating Custom Guardrail Plugins¶
-
Choose the Base Protocol:
InputGuardrailPlugin
: Implementasync def check_input(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation
.OutputGuardrailPlugin
: Implementasync def check_output(self, data: Any, context: Optional[Dict[str, Any]]) -> GuardrailViolation
.ToolUsageGuardrailPlugin
: Implementasync def check_tool_usage(self, tool: Tool, params: Dict[str, Any], context: Optional[Dict[str, Any]]) -> GuardrailViolation
. A single plugin class can implement multiple of these protocols if it's designed to check different types of data.
-
Implement the Check Logic: Your method should analyze the
data
(andcontext
ortool
/params
) and return aGuardrailViolation
dictionary. -
Register Your Plugin: Use entry points in
pyproject.toml
or place it in a directory specified byplugin_dev_dirs
inMiddlewareConfig
. -
Configure It: Add its ID to the appropriate list in
features
(e.g.,features.input_guardrails
) and provide any necessary configuration inguardrail_configurations
.