Token Usage Tracking (genie.usage
)¶
Genie Tooling provides an interface for tracking token usage by LLM providers, accessible via genie.usage
. This helps in monitoring costs and understanding LLM consumption patterns.
Core Concepts¶
UsageTrackingInterface
(genie.usage
): The facade interface for recording and summarizing token usage.TokenUsageRecorderPlugin
: A plugin responsible for storing or exporting token usage records.- Built-in:
InMemoryTokenUsageRecorderPlugin
(alias:in_memory_token_recorder
): Stores records in memory. Useful for simple summaries and testing.OpenTelemetryMetricsTokenRecorderPlugin
(alias:otel_metrics_recorder
): Recommended for production. Emits token counts as standard OpenTelemetry metrics, which can be scraped by systems like Prometheus.
- Built-in:
TokenUsageRecord
(TypedDict): The structure for a single token usage event.
Configuration¶
Configure the default token usage recorder via FeatureSettings
.
Example 1: In-Memory Recorder (for Development)¶
from genie_tooling.config.models import MiddlewareConfig
from genie_tooling.config.features import FeatureSettings
app_config = MiddlewareConfig(
features=FeatureSettings(
# ... other features ...
token_usage_recorder="in_memory_token_recorder"
)
)
Example 2: OpenTelemetry Metrics Recorder (for Production)¶
This is the recommended approach for production monitoring. It requires an OpenTelemetry collector setup.
# Prerequisite: An OTel collector that can scrape Prometheus metrics.
app_config = MiddlewareConfig(
features=FeatureSettings(
# ... other features ...
token_usage_recorder="otel_metrics_recorder",
# The OTel SDK must be initialized, which is done by the OTel tracer.
# So, enable the tracer, even if you only care about metrics.
observability_tracer="otel_tracer",
),
# Configure the OTel tracer to initialize the SDK
observability_tracer_configurations={
"otel_tracer_plugin_v1": {
"otel_service_name": "my-app-with-token-metrics",
"exporter_type": "console" # Or your preferred OTel trace exporter
}
}
)
How It Works¶
Token usage is automatically recorded by genie.llm.chat()
and genie.llm.generate()
whenever the underlying LLM provider returns usage information.
Using the In-Memory Recorder¶
If you use "in_memory_token_recorder"
, you can get a simple summary:
# Assuming genie is initialized with the in-memory recorder
summary = await genie.usage.get_summary()
# Example output:
# {
# "in_memory_token_usage_recorder_v1": {
# "total_records": 5,
# "total_prompt_tokens": 1234,
# "total_completion_tokens": 567,
# "total_tokens_overall": 1801,
# "by_model": {
# "mistral:latest": { "prompt": 1234, "completion": 567, "total": 1801, "count": 5 }
# }
# }
# }
Using the OpenTelemetry Metrics Recorder¶
When token_usage_recorder="otel_metrics_recorder"
is configured, this plugin emits the following OTel metrics:
* llm.request.tokens.prompt
(Counter, unit: {token}
)
* llm.request.tokens.completion
(Counter, unit: {token}
)
* llm.request.tokens.total
(Counter, unit: {token}
)
These metrics will have attributes (labels) like llm.provider.id
, llm.model.name
, and llm.call_type
. Configure an OTel collector (e.g., with a Prometheus exporter) to scrape and visualize these metrics in a dashboarding tool like Grafana.
Example PromQL Queries:
* Total prompt tokens per model (rate over 5m): sum(rate(llm_request_tokens_prompt_total[5m])) by (llm_model_name)
* Total completion tokens per provider (rate over 5m): sum(rate(llm_request_tokens_completion_total[5m])) by (llm_provider_id)