guarded-llm¶
Strict JSON LLM calls — schema validation, budget guard, retry policy, multi-provider support.
Quick start (class-based API)¶
from pydantic import BaseModel
from guarded_llm import GuardedLLM, Budget, RetryPolicy
class Verdict(BaseModel):
verdict: str
confidence: float
llm = GuardedLLM(
provider="deepseek",
model="deepseek-v4-flash",
schema=Verdict,
budget=Budget(usd_total=0.50, usd_per_call=0.05),
retry=RetryPolicy(max_attempts=3, backoff_seconds=1.0),
)
out = llm.call("Is gravity a self-organized criticality system?")
print(out.verdict, out.confidence)
High-level class API¶
GuardedLLM ¶
GuardedLLM(provider: str, model: str, schema: Any, budget: Budget | None = None, retry: RetryPolicy | None = None, max_tokens: int = 2048, **provider_kwargs: Any)
Reusable strict-JSON LLM caller with budget + retry + multi-vendor support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider | str | registered provider name ( | required |
model | str | vendor-specific model id (e.g. | required |
schema | Any | how to validate LLM output. One of: * a | required |
budget | Budget | None | optional | None |
retry | RetryPolicy | None | optional | None |
max_tokens | int | max tokens per LLM call (default 2048). | 2048 |
provider_kwargs | Any | extra kwargs forwarded to the provider on every call (e.g. | {} |
Public API::
llm = GuardedLLM(provider, model, schema, budget=..., retry=...)
out = llm.call("prompt") # returns validated instance
llm.last_stats.cost_usd # cost of the last call
llm.budget.spent_usd # cumulative spend
Source code in packages/guarded-llm/src/guarded_llm/core.py
call ¶
call(prompt: str, *, system: str | None = None, messages: list[dict] | None = None, **kwargs: Any) -> Any
Run an LLM call and return the validated instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt | str | user prompt string (ignored if | required |
system | str | None | optional system prompt prepended as the first message. | None |
messages | list[dict] | None | optional fully-formed messages list (overrides | None |
**kwargs | Any | forwarded to the provider (e.g. | {} |
Returns:
| Type | Description |
|---|---|
Any | The validated instance (Pydantic model instance, dict, or legacy |
Any | dataclass instance — whatever the schema returns). |
Raises:
| Type | Description |
|---|---|
BudgetExceededError | if cumulative cost exceeds the Budget cap. |
RetryExhausted | if all attempts fail validation. |
LLMCallError | if the provider itself fails (network, auth, etc.) and all retries are exhausted. |
Source code in packages/guarded-llm/src/guarded_llm/core.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | |
call_as_result ¶
call_as_result(prompt: str, *, system: str | None = None, messages: list[dict] | None = None, **kwargs: Any) -> GuardrailResult
Like .call() but returns a GuardrailResult (never raises on validation failure — errors are accumulated in the result).
Source code in packages/guarded-llm/src/guarded_llm/core.py
GuardedCallStats dataclass ¶
GuardedCallStats(attempts: int = 0, cost_usd: float = 0.0, errors: list[str] = list(), raw_outputs: list[str] = list())
Per-call metadata returned alongside the parsed instance via .last_stats.
Useful for cost dashboards / debugging without changing the return type of .call() (which by default returns the parsed instance directly).
Budget dataclass ¶
Track and cap LLM spend across one or more GuardedLLM.call(...) runs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
usd_total | float | total budget cap in USD across the lifetime of this Budget. | required |
usd_per_call | float | max spend allowed for any single | inf |
Example::
b = Budget(usd_total=0.50, usd_per_call=0.10)
b.consume(0.03) # OK
b.consume(0.50) # raises BudgetExceeded (over per-call cap)
b.spent_usd # 0.03
b.remaining_usd # 0.47
consume ¶
Record a charge. Raise BudgetExceeded if it would exceed any cap.
The charge is NOT recorded when an exception is raised — so a caller can catch BudgetExceeded and the Budget state stays consistent.
Source code in packages/guarded-llm/src/guarded_llm/budget.py
RetryPolicy dataclass ¶
Backoff configuration for retry loops.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_attempts | int | total number of LLM calls to make before giving up (>= 1). | 3 |
backoff_seconds | float | base sleep between attempts (linear * attempt#). | 1.0 |
jitter | bool | if True, multiply sleep by uniform(0.5, 1.5) to avoid thundering-herd retry storms when many parallel clients share a single backend. | True |
Example::
policy = RetryPolicy(max_attempts=5, backoff_seconds=2.0)
for attempt in range(policy.max_attempts):
try:
return _try_call()
except RetryableError:
time.sleep(policy.sleep_seconds(attempt))
raise RetryExhausted("...")
sleep_seconds ¶
Compute backoff for the given attempt number (0-indexed).
attempt 0 → 0 sec (don't sleep before first call),
attempt N → N * backoff_seconds (* jitter, if enabled).
Source code in packages/guarded-llm/src/guarded_llm/retry.py
RetryExhausted ¶
Bases: SchemaValidationError
Raised when all retry attempts fail.
Carries the per-attempt error list and the final raw LLM output so callers can inspect what went wrong without re-running the loop.
Source code in packages/guarded-llm/src/guarded_llm/exceptions.py
SchemaValidator ¶
Wrap a pydantic.BaseModel to fit guarded-llm's (ok, err, instance) API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model | Any | a Pydantic v2 BaseModel subclass. | required |
Example::
class Out(BaseModel):
verdict: str
confidence: float
v = SchemaValidator(Out)
v.validate({"verdict": "KEEP", "confidence": 0.9}) # -> (True, None, Out(...))
Source code in packages/guarded-llm/src/guarded_llm/validator.py
validate ¶
Validate d against the Pydantic model.
Returns (ok, error_message_or_none, model_instance_or_none).
Source code in packages/guarded-llm/src/guarded_llm/validator.py
Functional / legacy API¶
guardrailed_llm_call ¶
guardrailed_llm_call(prompt_fn: Callable[[str | None], str] | None = None, llm_caller: Callable[[str], str] | None = None, schema_cls: Any = None, max_retries: int = 3, *, provider: str | None = None, model: str | None = None, messages: list[dict] | None = None, schema: Any = None, max_tokens: int = 2048, budget_cap_usd: float | None = None, retry_backoff_s: float = 0.0, **kwargs: Any) -> Any
Run an LLM call wrapped in the full guardrail stack.
Two call styles supported:
Legacy (positional, kept for backwards compat with v4/lib):
parsed, errors = guardrailed_llm_call(prompt_fn, llm_caller, MySchema, max_retries=3)
Provider (keyword, new public API):
result = guardrailed_llm_call(
provider="deepseek",
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "..."}],
schema=my_schema,
max_retries=3,
budget_cap_usd=0.05,
)
if result.ok:
print(result.parsed)
The provider style returns a GuardrailResult with cost / attempts / raw_outputs metadata. The legacy style returns the (parsed, errors) tuple unchanged from v4/lib/llm_guardrail.py.
Source code in packages/guarded-llm/src/guarded_llm/guardrail.py
GuardrailResult dataclass ¶
GuardrailResult(parsed: Any, errors: list[str] = list(), attempts: int = 0, cost_usd: float = 0.0, raw_outputs: list[str] = list())
Outcome of a guarded LLM call.
Attributes:
| Name | Type | Description |
|---|---|---|
parsed | Any | validated instance (dict for LLMSchema, dataclass for legacy schemas) or None if all retries failed |
errors | list[str] | per-attempt error strings (empty if first try succeeded) |
attempts | int | number of LLM calls actually made |
cost_usd | float | estimated cumulative cost in USD (provider-reported, may be 0) |
raw_outputs | list[str] | raw text returned by each attempt (for debugging) |
state_machine_fix ¶
Best-effort repair of common LLM JSON drift bugs.
Applies, in order: 1. fence strip + JSON envelope locate 2. comment strip 3. NaN / Infinity -> null 4. single-quote -> double-quote 5. unescaped interior quote escape 6. trailing comma removal
Never raises; hands back its best guess.
Source code in packages/guarded-llm/src/guarded_llm/guardrail.py
validate_json ¶
Parse + schema-validate.
Accepts a raw string (json.loads first) or an already-parsed dict/list. Schema can be either a dataclass schema class or an LLMSchema instance. Returns (success, error_or_none, instance_or_none).
Source code in packages/guarded-llm/src/guarded_llm/guardrail.py
Schemas¶
LLMSchema ¶
Generic JSON Schema wrapper compatible with guardrailed_llm_call.
Example
schema = LLMSchema({ ... "type": "object", ... "properties": { ... "verdict": {"type": "string", "enum": ["KEEP", "REJECT"]}, ... "confidence": {"type": "number", "minimum": 0, "maximum": 1}, ... }, ... "required": ["verdict", "confidence"], ... }) ok, err, inst = schema.validate({"verdict": "KEEP", "confidence": 0.9}) assert ok and inst == {"verdict": "KEEP", "confidence": 0.9}
Notes
- The returned
instanceis just the validated dict (no class wrapping). - Requires
jsonschema>=4.0; if not installed the constructor raises.
Source code in packages/guarded-llm/src/guarded_llm/schemas.py
validate ¶
Validate d against this schema.
Returns (ok, error_message_or_none, validated_instance_or_none).
Source code in packages/guarded-llm/src/guarded_llm/schemas.py
validate_response ¶
Validate a parsed dict against either an LLMSchema or a dataclass schema class.
Source code in packages/guarded-llm/src/guarded_llm/schemas.py
Layer3CriticVerdict dataclass ¶
Layer3CriticVerdict(class_id: str, review_verdict: str, confidence: str, flagged_count: int, reasoning: str)
B1 critic pass output per class.
review_verdict: "KEEP" | "SPLIT" | "REJECT" | "MERGE_WITH(
Layer4Prediction dataclass ¶
Layer4Prediction(class_id: str, target_system: str, physical_quantity: str, predicted_band: list[float], evidence_url: str | None = None, journal_target: str | None = None)
A predicted observation in a target system.
B3EnsembleReview dataclass ¶
One model's verdict on one class in an N-model ensemble vote.
Providers¶
BaseProvider ¶
Bases: ABC
Interface every provider adapter implements.
call abstractmethod ¶
Send messages to the LLM and return {"text": str, "cost_usd": float}.
Source code in packages/guarded-llm/src/guarded_llm/providers/__init__.py
get_provider ¶
Instantiate and return the provider named name.
Raises ValueError if the provider isn't registered.
Source code in packages/guarded-llm/src/guarded_llm/providers/__init__.py
list_providers ¶
register_provider ¶
Add (or override) a provider in the registry.
Source code in packages/guarded-llm/src/guarded_llm/providers/__init__.py
Exceptions¶
GuardrailError ¶
Bases: Exception
Base class for all guarded-llm errors.
SchemaValidationError ¶
Bases: GuardrailError
Raised when LLM output fails schema validation after all retries.
Attributes:
| Name | Type | Description |
|---|---|---|
attempts | list of per-attempt error strings (length == max_retries) | |
last_raw | the raw LLM text from the final attempt (may aid debugging) |
Source code in packages/guarded-llm/src/guarded_llm/exceptions.py
LLMCallError ¶
Bases: GuardrailError
Raised when the underlying LLM HTTP/SDK call fails (network, auth, etc.).
BudgetExceededError ¶
Bases: GuardrailError
Raised when cumulative cost in a single call exceeds the user-supplied cap.