Aegis Orchestrator
Core Concepts

The Execution Loop

How the 100monkeys outer loop, inner loop, and Dispatch Protocol work together to run an agent execution.

The Execution Loop

AEGIS executes agents in a two-loop architecture. The outer loop (ExecutionSupervisor) manages retries across fresh container instances. The inner loop (InnerLoopService) handles transparent LLM tool call interception within a single container run. Understanding both is essential for reasoning about agent behavior, debugging failures, and configuring validation correctly.


Why Iterative Execution?

Standard autonomous agents operate on a single-shot basis: generate output, execute it, return the result. When the LLM hallucinates an API, makes an off-by-one error, or produces structurally valid but semantically wrong code, the task fails — with no recovery path.

AEGIS treats LLM output as a candidate solution rather than a final product. Every iteration result is verified against declared success criteria before being accepted. This shifts the execution model from probabilistic generation to verified correctness: the orchestrator accepts an output only when it can demonstrate, deterministically, that the output meets the task requirements.

The practical effect: agents that would succeed roughly 60% of the time on a single attempt routinely reach >95% success rates through iteration, because the specific failure reason is fed back to the model as compiler-style feedback for the next attempt.


Execution Modes

An agent's spec.execution.mode controls whether the orchestrator runs one attempt or the full iterative refinement loop.

spec:
  execution:
    mode: iterative     # "iterative" (default) | "single"
    max_iterations: 10  # 1–10; default: 10
    iteration_timeout: "300s"  # per-iteration wall-clock limit; default: 300s
ModeBehaviour
iterativeRun up to max_iterations attempts. Inject failure context between attempts. Accept output only when all validators pass.
singleRun exactly one attempt. Skip validation-driven retry logic. Useful for judge agents and short-lived classification tasks.

The Outer Loop: ExecutionSupervisor

The ExecutionSupervisor drives the 100monkeys iterative refinement strategy. When an execution is started, the supervisor:

  1. Spawns a fresh, isolated container from the agent's image using the configured container runtime (Docker or Podman). Each iteration gets a clean environment -- no shared process state from previous iterations. In rootless Podman deployments, workspace volumes are mounted via FUSE bind mounts from the host-side FUSE daemon rather than NFS; the agent code is identical in both cases.
  2. Injects context into the container: the task input, any prior iteration errors (on retries), and tool definitions filtered to the agent's declared capabilities.
  3. Monitors execution until the inner loop reports completion or timeout.
  4. Runs validators against the iteration output. Validators produce a ValidationScore (0.0–1.0) and Confidence (0.0–1.0).
  5. Decides the next action:
    • Score ≥ threshold → IterationStatus::Success → execution completes.
    • Score < threshold, iterations remaining → IterationStatus::Refining → error context injected, next container spawned.
    • Score < threshold, max iterations reached → IterationStatus::Failed → execution fails.
    • Hard timeout hit → execution cancelled.

Iteration State Machine

          ┌─────────┐
          │ Running │
          └────┬────┘
               │ inner loop completes

         ┌──────────┐      score ≥ threshold     ┌─────────┐
         │ Validating│─────────────────────────▶ │ Success │
         └──────────┘                            └─────────┘
               │ score < threshold
               │ iterations < max

          ┌──────────┐
          │ Refining │──── inject error, spawn next container
          └──────────┘
               │ iterations == max

           ┌────────┐
           │ Failed │
           └────────┘

Execution Aggregate

The Execution aggregate root enforces these invariants at the domain level:

  • At most one iteration is running at a time.
  • Iteration numbers are sequential with no gaps.
  • max_iterations cannot be exceeded (default: 10).
  • A completed or failed execution is immutable.

The ExecutionHierarchy value object tracks an execution's position within multi-agent structures. It includes an optional swarm_id field that links the execution to its parent swarm when spawned as a child agent. This swarm_id is propagated to all child executions for end-to-end tracing correlation across the swarm.


The Inner Loop: InnerLoopService

Within a single container run, InnerLoopService implements transparent LLM tool call interception. The agent's bootstrap.py never receives raw tool call JSON from the LLM — the orchestrator handles it completely.

Communication Channel

bootstrap.py communicates with the orchestrator over a single long-running HTTP POST to /v1/dispatch-gateway. This channel carries a discriminated union of message types in both directions.

Agent → Orchestrator (AgentMessage):

// Start an LLM inference pass
{
  "type": "generate",
  "agent_id": "<agent-uuid>",
  "execution_id": "<execution-uuid>",
  "iteration_number": 1,
  "prompt": "Write a function that reverses a string.",
  "model_alias": "default",
  "messages": []
}

// Return result of a dispatched subprocess
{
  "type": "dispatch_result",
  "dispatch_id": "<uuid>",
  "exit_code": 0,
  "stdout": "...",
  "stderr": ""
}
FieldRequiredDescription
typeyes"generate" to start an LLM pass; "dispatch_result" to return a subprocess result.
agent_idyesUUID of the agent. Injected by the orchestrator as AEGIS_AGENT_ID.
execution_idyesUUID of the current execution. Injected as AEGIS_EXECUTION_ID.
iteration_numberyes1-based iteration counter. Injected as AEGIS_ITERATION.
promptyesThe fully-rendered task prompt. Passed to bootstrap.py as argv[1].
model_aliasnoLLM alias to route through. Defaults to "default" (the orchestrator's configured default provider). Maps from spec.runtime.model in the agent manifest.
messagesnoPrior conversation turns to prepend. Empty array for a fresh iteration start.

Orchestrator → Agent (OrchestratorMessage):

// LLM produced a final text response — iteration complete
{ "type": "final", "content": "I have written the solution to /workspace/solution.py" }

// LLM requested a subprocess execution — run it and report back
{ "type": "dispatch", "dispatch_id": "<uuid>", "action": "exec", "command": "python", "args": ["/workspace/test.py"] }

Tool Call Flow

When the LLM emits a tool call in its response:

  1. The orchestrator intercepts and routes the tool call via the Tool Router (see MCP Tool Routing).
  2. The tool result is appended to the conversation as a tool message.
  3. The orchestrator makes another LLM inference call with the updated conversation.
  4. Steps 1–3 repeat until the LLM emits a response with no tool calls.
  5. The orchestrator returns OrchestratorMessage { type: "final" } to bootstrap.py.

From bootstrap.py's perspective, it sends one generate message and receives one final response. All tool calls happen inside the orchestrator.


The Dispatch Protocol

The Dispatch Protocol handles tool calls that require running a subprocess inside the agent container — for example, compiling code, running tests, or executing an interpreter. This is Path 3 of the three-path tool router.

Why Not Container Exec?

The orchestrator cannot call the container runtime's exec API directly because:

  • It would be runtime-specific (breaks Firecracker support and ties the orchestrator to a single container engine).
  • It bypasses the bootstrap script's process supervision and output capture.

Instead, the orchestrator sends a dispatch message over the existing HTTP channel to bootstrap.py, which runs the subprocess and sends back a dispatch_result.

Wire Format

Orchestrator                              bootstrap.py (in container)
     │                                           │
     │  POST /v1/dispatch-gateway                    │
     │  { type: "generate", messages: [...] }    │
     │ ─────────────────────────────────────────▶│
     │                                           │  (LLM calls cmd.run)
     │  { type: "dispatch",                      │
     │    dispatch_id: "abc-123",                │
     │    action: "exec",                        │
     │    command: "python",                     │
     │    args: ["/workspace/test.py"] }         │
     │ ◀─────────────────────────────────────────│
     │                                           │  (runs subprocess)
     │  POST /v1/dispatch-gateway                    │
     │  { type: "dispatch_result",               │
     │    dispatch_id: "abc-123",                │
     │    exit_code: 0,                          │
     │    stdout: "All tests passed",            │
     │    stderr: "" }                           │
     │ ─────────────────────────────────────────▶│
     │                                           │  (LLM continues)
     │  { type: "final", content: "..." }        │
     │ ◀─────────────────────────────────────────│

SubcommandAllowlist

Not every cmd.run invocation is permitted. The node config's SubcommandAllowlist is a HashMap<String, Vec<String>> mapping base commands to their allowed first positional arguments:

tools:
  subcommand_allowlist:
    python: ["test.py", "-m", "-c"]
    node: ["index.js", "test.js"]
    cargo: ["build", "test", "run"]
    npm: ["install", "test", "run"]
    git: ["status", "diff", "log"]

If an agent attempts cmd.run with a command or subcommand not in the allowlist, the orchestrator rejects the dispatch and returns a policy violation event.


Validation: Gradient Scoring

AEGIS does not use binary pass/fail validation. Every validator produces:

  • ValidationScore (0.0–1.0): how well the output meets the criteria.
  • Confidence (0.0–1.0): certainty of the score.

This allows nuanced threshold configuration. An agent can succeed with a score of 0.8 if min_score: 0.8 is set, or be required to reach 0.95 for high-stakes tasks.

Multiple validators are applied in order. The execution proceeds to the next iteration if any validator's score falls below its threshold.

Validator types:

  • exit_code — deterministic; checks process exit code.
  • json_schema — deterministic; validates stdout against a JSON Schema.
  • regex — deterministic; matches stdout against a pattern.
  • semantic — LLM-as-Judge; single judge agent produces a score.
  • multi_judge — consensus across N judge agents.

See Configuring Agent Validation for full configuration details.


Timeout Hierarchy

Four independent timeout boundaries apply to every execution. They nest from outermost to innermost:

LevelConfigDefaultEnforced by
Overall executionspec.resources.timeout_seconds1800 s (30 min)ExecutionSupervisor outer tokio::time::timeout
Per iterationspec.execution.iteration_timeout300 sExecutionSupervisor per-iteration tokio::time::timeout
LLM HTTP requestAEGIS_LLM_TIMEOUT_SECONDS env300 sbootstrap.py socket timeout
Inner loop tool callsMAX_INNER_LOOP_ITERATIONS constant50 callsInnerLoopService loop bail

The outer timeout is a hard wall: if the total execution time exceeds timeout_seconds, the orchestrator cancels the execution regardless of which iteration is in progress. The per-iteration timeout is reset for each new iteration, so a long-running first iteration does not consume the per-iteration budget of subsequent attempts.

On this page