Configuring Agent Validation
Validator types, gradient scoring, threshold configuration, inner-loop tool-call judging, and chaining multiple validators.
Configuring Agent Validation
AEGIS uses a gradient validation system. Instead of binary pass/fail, each validator produces a ValidationScore (0.0-1.0) and a Confidence (0.0-1.0). The execution loop compares the score against a configured threshold to decide whether to proceed to the next iteration or accept the output. AEGIS also supports a separate pre-dispatch semantic judge for selected tool calls via execution.tool_validation; see Tool-Call Judging for the tool-specific runtime contract.
How Validation Works
At the end of each iteration:
- Each validator in
spec.validationis evaluated in order. - If all validators' scores meet their thresholds →
IterationStatus::Success→ execution completes. - If any validator's score falls below its threshold and iterations remain →
IterationStatus::Refining→ error context is injected and the next iteration begins. - If retries are exhausted →
IterationStatus::Failed.
Tool-call judging uses the same gradient vocabulary, but it runs inside the inner loop before a tool is dispatched. That means a bad cmd.run, fs.write, or external tool call can be rejected without burning the full iteration budget.
Validator Types
exit_code
Checks the container's process exit code. Deterministic; ValidationScore is always 1.0 (pass) or 0.0 (fail).
validation:
- type: exit_code
expected: 0 # any non-zero exit code fails this validatorUse this as the first validator to catch hard failures (e.g., uncaught exceptions, build failures) cheaply before running more expensive validators.
json_schema
Validates a file in the agent's workspace against a JSON Schema. Deterministic.
validation:
- type: json_schema
schema_path: /agent/output_schema.json # path inside container
target_path: /workspace/result.json # file to validate
min_score: 1.0 # must fully pass schemaThe schema file is baked into the container image at schema_path. The target_path is the file the agent is expected to produce in its workspace volume.
regex
Validates that stdout matches a regular expression. Deterministic.
validation:
- type: regex
pattern: "^\\{.*\"status\":\\s*\"success\".*\\}$"
target: stdout # "stdout" or a file path
min_score: 1.0semantic
A single LLM-as-Judge agent evaluates the output and produces a gradient score.
validation:
- type: semantic
judge_agent: code-quality-judge # must be a deployed agent
criteria: |
Evaluate the submitted Python code on:
1. Correctness: Does it solve the stated problem?
2. Code quality: Is it idiomatic Python?
3. Error handling: Does it handle edge cases?
Score 0.0 for fundamentally broken code, 1.0 for production-ready code.
min_score: 0.75
min_confidence: 0.70The judge agent receives the iteration's output and the criteria text, then returns a JSON object:
{ "score": 0.82, "confidence": 0.91, "reasoning": "..." }multi_judge
Runs multiple judge agents and aggregates their scores via consensus. Useful for high-stakes validation where a single judge's bias could skew results.
validation:
- type: multi_judge
judges:
- code-quality-judge
- security-reviewer-judge
- test-coverage-judge
consensus: weighted_average # "weighted_average" | "majority" | "unanimous" | "best_of_n"
min_judges_required: 2 # at least 2 judges must respond; otherwise the validator fails
criteria: |
Score the output from 0.0 to 1.0 on overall production readiness.
min_score: 0.80
min_confidence: 0.65| Consensus Mode | Description |
|---|---|
weighted_average | Weighted average of all judges' scores. Weights can be tuned per judge; confidence weighting is applied when confidence_weighting is configured. |
majority | The result of the majority of judges determines the outcome. |
unanimous | All judges must exceed the threshold. Most conservative — a single dissenting judge fails the validator. |
best_of_n | Highest score among all judges. Most permissive — any judge's approval suffices. |
Inner-Loop Tool Validation (tool_validation)
While the validation blocks above operate on the outer execution iteration (end of run), agents can also be checked during execution just before they invoke a tool. This prevents agents from calling potentially unsafe tools or wasting time on hallucinations.
Unlike outer validators, the inner tool_validation acts synchronously and pre-execution:
- The agent proposes a tool call, such as
cmd.run. - The orchestrator intercepts the call and evaluates the configured semantic judge before dispatch.
- The judge receives the proposed call, the available tools, the current criteria, and the worker mount context.
- If the judge returns
score < min_scoreorconfidence < min_confidence, the tool call is rejected, the judge reasoning is appended to the inner-loop prompt, and the agent tries again without terminating the full iteration. - If the judge passes, the tool call proceeds into the normal routing and policy pipeline.
Operators can explicitly bypass this semantic gate with skip_judge for approved tools. That removes only the judge step; the call still remains subject to the normal policy and authorization path. For the complete gate semantics and bypass rules, see Tool-Call Judging.
execution:
tool_validation:
- type: semantic
judge_agent: security-judge
criteria: |
Determine if the tool call and arguments are safe and align with the agent's goal.
min_score: 0.85
min_confidence: 0.8
timeout_seconds: 300Tool Judge Contract
The tool judge sees a semantic payload rather than raw output:
| Field | Purpose |
|---|---|
task | The task or instruction currently being executed. |
proposed_tool_call | The tool name and arguments the agent wants to run. |
available_tools | The tool list visible to the worker at that point in the inner loop. |
worker_mounts | The inherited mount context available to the judge execution. |
criteria | The rubric text configured by the operator or agent author. |
validation_context | A marker that identifies the inner-loop pre-execution judge path. |
policy_violations | List of tool names blocked by platform policy earlier in this iteration. Empty if none were blocked. |
The judge returns a JSON verdict with score, confidence, and reasoning, plus optional signals and metadata. The orchestrator compares both thresholds before allowing the tool to execute.
Judges can use policy_violations to reason about what the agent attempted versus what the platform permitted. For example, if the agent's workflow required calling a tool that was policy-blocked, the judge can reflect that in its reasoning and score the attempt charitably rather than penalizing the agent for a tool gap caused by platform constraints.
Gradient Scoring vs. Binary Validation
Traditional validators return pass/fail. AEGIS validators return a score and confidence, enabling:
- Threshold tuning: Set
min_score: 0.6for fast iteration during development; tighten to0.9for production agents. - Multi-criteria ranking: Compare two executions by their aggregate score to pick the better output.
- Confidence gating: Set
min_confidence: 0.7to reject verdicts from judges that are uncertain. When a judge's self-reported confidence falls belowmin_confidence, the score is treated as failing the threshold — the iteration moves toRefiningand the low-confidence reasoning is injected as error context for the next attempt. The judge is not re-run within the same iteration.
Chaining Validators
Validators run in the declared order. Each must pass for the iteration to succeed. The execution loop uses the lowest-scoring validator as the reported score for the iteration.
A typical chain orders validators cheapest-first:
validation:
# 1. Cheapest: deterministic exit code check
- type: exit_code
expected: 0
# 2. Deterministic: JSON schema check
- type: json_schema
schema_path: /agent/schema.json
target_path: /workspace/output.json
min_score: 1.0
# 3. Expensive: LLM judge (only runs if the above pass)
- type: semantic
judge_agent: quality-judge
criteria: "Is the output correct and complete?"
min_score: 0.80
min_confidence: 0.70This avoids running the LLM judge (slow and costly) when the deterministic checks fail.
Agent-as-Judge Pattern
The judge agent specified in semantic or multi_judge validators is a regular AEGIS agent defined with its own manifest. This means judges can:
- Be updated independently of the agent they evaluate.
- Run in an isolated container with their own resource limits and security policy.
- Be specialized for specific domains (e.g., a judge trained to evaluate security code reviews).
- Run as child executions in the execution tree — visible in execution APIs and event streams for the parent execution.
- Receive the worker execution's inherited mount context through
worker_mounts, so file-based checks can inspect the same artifacts the worker can reach when the judge manifest permits access.
Judges always use execution.mode: one-shot. Judgment is a single-shot decision by design; the judge does not retry its own verdict through the iteration loop.
Example judge agent manifest:
apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
name: code-quality-judge
version: "1.0.0"
labels:
role: judge
spec:
runtime:
language: python
version: "3.11"
task:
instruction: |
You are a code quality judge. Evaluate the provided Python code and return a JSON verdict:
{"score": 0.0-1.0, "confidence": 0.0-1.0, "reasoning": "...", "verdict": "pass|fail|warning"}
security:
network:
mode: none
resources:
timeout: "60s"
memory: "512Mi"
execution:
mode: one-shot
validation:
system:
must_succeed: true
output:
format: json
schema:
type: object
required: ["score", "confidence", "reasoning"]
properties:
score:
type: number
minimum: 0
maximum: 1
confidence:
type: number
minimum: 0
maximum: 1
reasoning:
type: stringThe judge runtime writes its final verdict as text output, and the orchestrator parses that output as JSON. The judge should emit a JSON object containing the verdict fields on its final turn; the runtime does not require a verdict file path. worker_mounts is still the source of truth for filesystem context available to the judge, but verdict transport is text-based.
Judge input payloads expose mounted filesystem context through worker_mounts (array of inherited mount paths). Treat this as the source of truth for artifact discovery.
Judge Input Schema
Judges should declare spec.input_schema with the property names they expect to receive. The orchestrator maps ValidationRequest fields to these property names at runtime, so the property names in the schema determine which evaluation data reaches the judge's prompt.
Canonical property names for judge agents:
| Property | Purpose |
|---|---|
generated_manifest or output | The content being evaluated — agent output, generated YAML, etc. |
generated_workflow | The workflow definition being evaluated (workflow judges). |
user_objective or task | The original objective or task that the worker was fulfilling. |
criteria | The evaluation rubric text passed from the validator configuration. |
deployment_result | Deployment outcome, when relevant (optional). |
tool_call_history | Tool calls made during execution (optional). |
worker_mounts | Inherited filesystem mount paths available to the judge (optional). |
validation_context | Automatically set to the judge agent's name by the orchestrator. |
If a judge does not declare spec.input_schema, the orchestrator falls back to passing the raw content as a string with the criteria field as the intent. Declaring the schema explicitly gives the judge structured access to all evaluation data and enables precise prompt construction.
input_schema must be declared under spec: in the judge manifest, not under execution or any nested block.
Example judge manifest with input_schema:
apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
name: manifest-quality-judge
version: "1.0.0"
labels:
role: judge
spec:
input_schema:
type: object
required:
- generated_manifest
- user_objective
- criteria
properties:
generated_manifest:
type: string
description: "The generated YAML manifest to evaluate."
user_objective:
type: string
description: "The original objective the manifest was created to fulfill."
criteria:
type: string
description: "Evaluation rubric."
deployment_result:
type: string
description: "Outcome of the deployment attempt, if available."
validation_context:
type: string
description: "Automatically set to this judge agent's name."
runtime:
language: python
version: "3.11"
task:
instruction: |
You are a manifest quality judge. Evaluate the provided YAML manifest against the
stated objective and return a JSON verdict:
{"score": 0.0-1.0, "confidence": 0.0-1.0, "reasoning": "...", "verdict": "pass|fail|warning"}
security:
network:
mode: none
resources:
timeout: "60s"
memory: "512Mi"
execution:
mode: one-shot
validation:
output:
format: json
schema:
type: object
required: ["score", "confidence", "reasoning"]
properties:
score:
type: number
minimum: 0
maximum: 1
confidence:
type: number
minimum: 0
maximum: 1
reasoning:
type: stringSecurity Guidance for Judge Manifests
Because judge outputs influence whether an iteration succeeds, a compromised judge is a high-value attack surface. A few practices reduce the risk:
- Disable network access when possible. A judge evaluating code output rarely needs to make outbound calls. Use
security.network.mode: noneto prevent a prompt-injected payload from exfiltrating data or calling arbitrary APIs from inside the judge container. If the judge must call an LLM provider, usemode: allowand restrictallowlistto that provider's domain only (e.g.,api.openai.com). - Set a short timeout. Judgment should be fast. Use
resources.timeout: "60s"or tighter to prevent a misbehaving judge from blocking iteration indefinitely. - Validate judge output strictly. The
output.schemain the judge manifest should requirescore,confidence, andreasoningas non-nullable fields. An invalid or missing verdict is treated as a failing score by the orchestrator.
Validation Configuration Reference
| Field | Type | Default | Description |
|---|---|---|---|
type | string | — | Validator type: exit_code, json_schema, regex, semantic, multi_judge. |
min_score | float | 1.0 | Minimum ValidationScore to consider this validator passed. |
min_confidence | float | 0.0 | Minimum Confidence to accept the score. If confidence is below this, the score is treated as failing. |
judge_agent | string | — | (semantic only) Name of the judge agent to invoke. |
judges | string[] | — | (multi_judge only) List of judge agent names. |
consensus | string | weighted_average | (multi_judge only) Score aggregation strategy: weighted_average, majority, unanimous, or best_of_n. |
min_judges_required | integer | 1 | (multi_judge only) Minimum number of judges that must return a result. If fewer respond, the validator fails regardless of scores. |
min_agreement_confidence | float | — | (multi_judge only) Minimum inter-judge agreement factor required before the consensus score is accepted. |
criteria | string | — | (semantic, multi_judge) Instructions to the judge about what to evaluate. |
expected | integer | 0 | (exit_code only) Expected process exit code. |
schema_path | string | — | (json_schema only) Path to the JSON Schema file inside the container. |
target_path | string | — | (json_schema only) Path to the file to validate. |
pattern | string | — | (regex only) Regular expression pattern. |
target | string | stdout | (regex only) stdout or an absolute file path. |