Aegis Orchestrator
Guides

Configuring Agent Validation

Validator types, gradient scoring, threshold configuration, inner-loop tool-call judging, and chaining multiple validators.

Configuring Agent Validation

AEGIS uses a gradient validation system. Instead of binary pass/fail, each validator produces a ValidationScore (0.0-1.0) and a Confidence (0.0-1.0). The execution loop compares the score against a configured threshold to decide whether to proceed to the next iteration or accept the output. AEGIS also supports a separate pre-dispatch semantic judge for selected tool calls via execution.tool_validation; see Tool-Call Judging for the tool-specific runtime contract.


How Validation Works

At the end of each iteration:

  1. Each validator in spec.validation is evaluated in order.
  2. If all validators' scores meet their thresholds → IterationStatus::Success → execution completes.
  3. If any validator's score falls below its threshold and iterations remain → IterationStatus::Refining → error context is injected and the next iteration begins.
  4. If retries are exhausted → IterationStatus::Failed.

Tool-call judging uses the same gradient vocabulary, but it runs inside the inner loop before a tool is dispatched. That means a bad cmd.run, fs.write, or external tool call can be rejected without burning the full iteration budget.


Validator Types

exit_code

Checks the container's process exit code. Deterministic; ValidationScore is always 1.0 (pass) or 0.0 (fail).

validation:
  - type: exit_code
    expected: 0          # any non-zero exit code fails this validator

Use this as the first validator to catch hard failures (e.g., uncaught exceptions, build failures) cheaply before running more expensive validators.

json_schema

Validates a file in the agent's workspace against a JSON Schema. Deterministic.

validation:
  - type: json_schema
    schema_path: /agent/output_schema.json   # path inside container
    target_path: /workspace/result.json      # file to validate
    min_score: 1.0                           # must fully pass schema

The schema file is baked into the container image at schema_path. The target_path is the file the agent is expected to produce in its workspace volume.

regex

Validates that stdout matches a regular expression. Deterministic.

validation:
  - type: regex
    pattern: "^\\{.*\"status\":\\s*\"success\".*\\}$"
    target: stdout        # "stdout" or a file path
    min_score: 1.0

semantic

A single LLM-as-Judge agent evaluates the output and produces a gradient score.

validation:
  - type: semantic
    judge_agent: code-quality-judge    # must be a deployed agent
    criteria: |
      Evaluate the submitted Python code on:
      1. Correctness: Does it solve the stated problem?
      2. Code quality: Is it idiomatic Python?
      3. Error handling: Does it handle edge cases?
      Score 0.0 for fundamentally broken code, 1.0 for production-ready code.
    min_score: 0.75
    min_confidence: 0.70

The judge agent receives the iteration's output and the criteria text, then returns a JSON object:

{ "score": 0.82, "confidence": 0.91, "reasoning": "..." }

multi_judge

Runs multiple judge agents and aggregates their scores via consensus. Useful for high-stakes validation where a single judge's bias could skew results.

validation:
  - type: multi_judge
    judges:
      - code-quality-judge
      - security-reviewer-judge
      - test-coverage-judge
    consensus: weighted_average  # "weighted_average" | "majority" | "unanimous" | "best_of_n"
    min_judges_required: 2       # at least 2 judges must respond; otherwise the validator fails
    criteria: |
      Score the output from 0.0 to 1.0 on overall production readiness.
    min_score: 0.80
    min_confidence: 0.65
Consensus ModeDescription
weighted_averageWeighted average of all judges' scores. Weights can be tuned per judge; confidence weighting is applied when confidence_weighting is configured.
majorityThe result of the majority of judges determines the outcome.
unanimousAll judges must exceed the threshold. Most conservative — a single dissenting judge fails the validator.
best_of_nHighest score among all judges. Most permissive — any judge's approval suffices.

Inner-Loop Tool Validation (tool_validation)

While the validation blocks above operate on the outer execution iteration (end of run), agents can also be checked during execution just before they invoke a tool. This prevents agents from calling potentially unsafe tools or wasting time on hallucinations.

Unlike outer validators, the inner tool_validation acts synchronously and pre-execution:

  1. The agent proposes a tool call, such as cmd.run.
  2. The orchestrator intercepts the call and evaluates the configured semantic judge before dispatch.
  3. The judge receives the proposed call, the available tools, the current criteria, and the worker mount context.
  4. If the judge returns score < min_score or confidence < min_confidence, the tool call is rejected, the judge reasoning is appended to the inner-loop prompt, and the agent tries again without terminating the full iteration.
  5. If the judge passes, the tool call proceeds into the normal routing and policy pipeline.

Operators can explicitly bypass this semantic gate with skip_judge for approved tools. That removes only the judge step; the call still remains subject to the normal policy and authorization path. For the complete gate semantics and bypass rules, see Tool-Call Judging.

execution:
  tool_validation:
    - type: semantic
      judge_agent: security-judge
      criteria: |
        Determine if the tool call and arguments are safe and align with the agent's goal.
      min_score: 0.85
      min_confidence: 0.8
      timeout_seconds: 300

Tool Judge Contract

The tool judge sees a semantic payload rather than raw output:

FieldPurpose
taskThe task or instruction currently being executed.
proposed_tool_callThe tool name and arguments the agent wants to run.
available_toolsThe tool list visible to the worker at that point in the inner loop.
worker_mountsThe inherited mount context available to the judge execution.
criteriaThe rubric text configured by the operator or agent author.
validation_contextA marker that identifies the inner-loop pre-execution judge path.
policy_violationsList of tool names blocked by platform policy earlier in this iteration. Empty if none were blocked.

The judge returns a JSON verdict with score, confidence, and reasoning, plus optional signals and metadata. The orchestrator compares both thresholds before allowing the tool to execute.

Judges can use policy_violations to reason about what the agent attempted versus what the platform permitted. For example, if the agent's workflow required calling a tool that was policy-blocked, the judge can reflect that in its reasoning and score the attempt charitably rather than penalizing the agent for a tool gap caused by platform constraints.


Gradient Scoring vs. Binary Validation

Traditional validators return pass/fail. AEGIS validators return a score and confidence, enabling:

  • Threshold tuning: Set min_score: 0.6 for fast iteration during development; tighten to 0.9 for production agents.
  • Multi-criteria ranking: Compare two executions by their aggregate score to pick the better output.
  • Confidence gating: Set min_confidence: 0.7 to reject verdicts from judges that are uncertain. When a judge's self-reported confidence falls below min_confidence, the score is treated as failing the threshold — the iteration moves to Refining and the low-confidence reasoning is injected as error context for the next attempt. The judge is not re-run within the same iteration.

Chaining Validators

Validators run in the declared order. Each must pass for the iteration to succeed. The execution loop uses the lowest-scoring validator as the reported score for the iteration.

A typical chain orders validators cheapest-first:

validation:
  # 1. Cheapest: deterministic exit code check
  - type: exit_code
    expected: 0

  # 2. Deterministic: JSON schema check
  - type: json_schema
    schema_path: /agent/schema.json
    target_path: /workspace/output.json
    min_score: 1.0

  # 3. Expensive: LLM judge (only runs if the above pass)
  - type: semantic
    judge_agent: quality-judge
    criteria: "Is the output correct and complete?"
    min_score: 0.80
    min_confidence: 0.70

This avoids running the LLM judge (slow and costly) when the deterministic checks fail.


Agent-as-Judge Pattern

The judge agent specified in semantic or multi_judge validators is a regular AEGIS agent defined with its own manifest. This means judges can:

  • Be updated independently of the agent they evaluate.
  • Run in an isolated container with their own resource limits and security policy.
  • Be specialized for specific domains (e.g., a judge trained to evaluate security code reviews).
  • Run as child executions in the execution tree — visible in execution APIs and event streams for the parent execution.
  • Receive the worker execution's inherited mount context through worker_mounts, so file-based checks can inspect the same artifacts the worker can reach when the judge manifest permits access.

Judges always use execution.mode: one-shot. Judgment is a single-shot decision by design; the judge does not retry its own verdict through the iteration loop.

Example judge agent manifest:

apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
  name: code-quality-judge
  version: "1.0.0"
  labels:
    role: judge
spec:
  runtime:
    language: python
    version: "3.11"

  task:
    instruction: |
      You are a code quality judge. Evaluate the provided Python code and return a JSON verdict:
      {"score": 0.0-1.0, "confidence": 0.0-1.0, "reasoning": "...", "verdict": "pass|fail|warning"}

  security:
    network:
      mode: none
    resources:
      timeout: "60s"
      memory: "512Mi"

  execution:
    mode: one-shot
    validation:
      system:
        must_succeed: true
      output:
        format: json
        schema:
          type: object
          required: ["score", "confidence", "reasoning"]
          properties:
            score:
              type: number
              minimum: 0
              maximum: 1
            confidence:
              type: number
              minimum: 0
              maximum: 1
            reasoning:
              type: string

The judge runtime writes its final verdict as text output, and the orchestrator parses that output as JSON. The judge should emit a JSON object containing the verdict fields on its final turn; the runtime does not require a verdict file path. worker_mounts is still the source of truth for filesystem context available to the judge, but verdict transport is text-based.

Judge input payloads expose mounted filesystem context through worker_mounts (array of inherited mount paths). Treat this as the source of truth for artifact discovery.

Judge Input Schema

Judges should declare spec.input_schema with the property names they expect to receive. The orchestrator maps ValidationRequest fields to these property names at runtime, so the property names in the schema determine which evaluation data reaches the judge's prompt.

Canonical property names for judge agents:

PropertyPurpose
generated_manifest or outputThe content being evaluated — agent output, generated YAML, etc.
generated_workflowThe workflow definition being evaluated (workflow judges).
user_objective or taskThe original objective or task that the worker was fulfilling.
criteriaThe evaluation rubric text passed from the validator configuration.
deployment_resultDeployment outcome, when relevant (optional).
tool_call_historyTool calls made during execution (optional).
worker_mountsInherited filesystem mount paths available to the judge (optional).
validation_contextAutomatically set to the judge agent's name by the orchestrator.

If a judge does not declare spec.input_schema, the orchestrator falls back to passing the raw content as a string with the criteria field as the intent. Declaring the schema explicitly gives the judge structured access to all evaluation data and enables precise prompt construction.

input_schema must be declared under spec: in the judge manifest, not under execution or any nested block.

Example judge manifest with input_schema:

apiVersion: 100monkeys.ai/v1
kind: Agent
metadata:
  name: manifest-quality-judge
  version: "1.0.0"
  labels:
    role: judge
spec:
  input_schema:
    type: object
    required:
      - generated_manifest
      - user_objective
      - criteria
    properties:
      generated_manifest:
        type: string
        description: "The generated YAML manifest to evaluate."
      user_objective:
        type: string
        description: "The original objective the manifest was created to fulfill."
      criteria:
        type: string
        description: "Evaluation rubric."
      deployment_result:
        type: string
        description: "Outcome of the deployment attempt, if available."
      validation_context:
        type: string
        description: "Automatically set to this judge agent's name."

  runtime:
    language: python
    version: "3.11"

  task:
    instruction: |
      You are a manifest quality judge. Evaluate the provided YAML manifest against the
      stated objective and return a JSON verdict:
      {"score": 0.0-1.0, "confidence": 0.0-1.0, "reasoning": "...", "verdict": "pass|fail|warning"}

  security:
    network:
      mode: none
    resources:
      timeout: "60s"
      memory: "512Mi"

  execution:
    mode: one-shot
    validation:
      output:
        format: json
        schema:
          type: object
          required: ["score", "confidence", "reasoning"]
          properties:
            score:
              type: number
              minimum: 0
              maximum: 1
            confidence:
              type: number
              minimum: 0
              maximum: 1
            reasoning:
              type: string

Security Guidance for Judge Manifests

Because judge outputs influence whether an iteration succeeds, a compromised judge is a high-value attack surface. A few practices reduce the risk:

  • Disable network access when possible. A judge evaluating code output rarely needs to make outbound calls. Use security.network.mode: none to prevent a prompt-injected payload from exfiltrating data or calling arbitrary APIs from inside the judge container. If the judge must call an LLM provider, use mode: allow and restrict allowlist to that provider's domain only (e.g., api.openai.com).
  • Set a short timeout. Judgment should be fast. Use resources.timeout: "60s" or tighter to prevent a misbehaving judge from blocking iteration indefinitely.
  • Validate judge output strictly. The output.schema in the judge manifest should require score, confidence, and reasoning as non-nullable fields. An invalid or missing verdict is treated as a failing score by the orchestrator.

Validation Configuration Reference

FieldTypeDefaultDescription
typestringValidator type: exit_code, json_schema, regex, semantic, multi_judge.
min_scorefloat1.0Minimum ValidationScore to consider this validator passed.
min_confidencefloat0.0Minimum Confidence to accept the score. If confidence is below this, the score is treated as failing.
judge_agentstring(semantic only) Name of the judge agent to invoke.
judgesstring[](multi_judge only) List of judge agent names.
consensusstringweighted_average(multi_judge only) Score aggregation strategy: weighted_average, majority, unanimous, or best_of_n.
min_judges_requiredinteger1(multi_judge only) Minimum number of judges that must return a result. If fewer respond, the validator fails regardless of scores.
min_agreement_confidencefloat(multi_judge only) Minimum inter-judge agreement factor required before the consensus score is accepted.
criteriastring(semantic, multi_judge) Instructions to the judge about what to evaluate.
expectedinteger0(exit_code only) Expected process exit code.
schema_pathstring(json_schema only) Path to the JSON Schema file inside the container.
target_pathstring(json_schema only) Path to the file to validate.
patternstring(regex only) Regular expression pattern.
targetstringstdout(regex only) stdout or an absolute file path.

On this page