Aegis Orchestrator
Guides

Writing Your First Agent

Step-by-step guide to writing an agent.yaml manifest and bootstrap.py from scratch.

Writing Your First Agent

This guide walks through creating a working agent from scratch: an agent.yaml manifest and the bootstrap.py script that runs inside the container. By the end, you will have an agent that accepts a task prompt, writes code to a workspace volume, runs it, and validates the output.


Prerequisites

  • AEGIS daemon running locally (see Getting Started)
  • Python 3.11+ in your target container image
  • The aegis-python SDK installed in the container image

Step 1: Create the Project Structure

my-agent/
├── agent.yaml
├── bootstrap.py
├── Dockerfile
└── output_schema.json

Step 2: Write the Dockerfile

Your agent runs inside the container image specified in the manifest. Build a minimal image with Python and the AEGIS SDK:

FROM python:3.11-slim

RUN pip install --no-cache-dir aegis-sdk

WORKDIR /agent
COPY bootstrap.py .
COPY output_schema.json .

CMD ["python", "/agent/bootstrap.py"]

Build and push the image:

docker build -t myregistry/my-agent:latest .
docker push myregistry/my-agent:latest

Step 3: Write bootstrap.py

bootstrap.py is the entrypoint for your agent. It uses the AEGIS Python SDK, which handles the /v1/llm/generate communication loop automatically.

import os
import json
from aegis import AegisClient, TaskInput

def main():
    client = AegisClient()

    # Receive the task input injected by the orchestrator
    task: TaskInput = client.get_task()
    user_request = task.input.get("task", "")

    # Build the initial conversation
    messages = [
        {
            "role": "system",
            "content": (
                "You are a Python developer. Write code to solve the given task. "
                "Save your solution to /workspace/solution.py. "
                "Run the solution with cmd.run to verify it works. "
                "When done, write a JSON summary to /workspace/result.json with "
                "fields: 'solution_path' and 'output'."
            )
        },
        {
            "role": "user",
            "content": user_request
        }
    ]

    # Run the inner loop — the SDK handles tool call interception transparently
    response = client.generate(messages=messages)

    # The response is the final text from the LLM after all tool calls are resolved
    print(response.content)

if __name__ == "__main__":
    main()

The SDK's client.generate() blocks until the LLM produces a non-tool-call response. All fs.*, cmd.run, and other capability calls are intercepted and executed by the orchestrator transparently.


Step 4: Write the Output Schema

Define what a valid output looks like. The json_schema validator checks /workspace/result.json against this schema.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["solution_path", "output"],
  "properties": {
    "solution_path": {
      "type": "string",
      "pattern": "^/workspace/"
    },
    "output": {
      "type": "string",
      "minLength": 1
    }
  },
  "additionalProperties": false
}

Step 5: Write the Agent Manifest

apiVersion: 100monkeys.ai/v1
kind: AgentManifest
metadata:
  name: python-coder
  version: "1.0.0"
  description: "Writes Python solutions to programming tasks."
  labels:
    role: worker
    team: platform
spec:
  runtime:
    language: python
    version: "3.11"
    isolation: docker

  task:
    instruction: |
      You are a Python developer. Write code to solve the given task.
      Save your solution to /workspace/solution.py.
      Run the solution to verify it works.
      Write a JSON summary to /workspace/result.json with fields: solution_path and output.

  security:
    network:
      mode: allow
      allowlist:
        - pypi.org
    filesystem:
      read:
        - /workspace
        - /agent
      write:
        - /workspace
    resources:
      cpu: 1000
      memory: "1Gi"
      timeout: "300s"

  volumes:
    - name: workspace
      storage_class: ephemeral
      mount_point: /workspace
      access_mode: read-write
      ttl_hours: 1
      size_limit_mb: 5000

  execution:
    mode: iterative
    max_iterations: 10
    validation:
      system:
        must_succeed: true
      output:
        format: json
        schema:
          type: object
          required: ["solution_path", "output"]
          properties:
            solution_path:
              type: string
            output:
              type: string

  tools:
    - name: filesystem
      server: "mcp:filesystem"
      config:
        allowed_paths: ["/workspace", "/agent"]
        access_mode: read-write

  env:
    PYTHONUNBUFFERED: "1"

Step 6: Deploy and Test

Deploy the agent:

aegis agent deploy ./my-agent/agent.yaml

Confirm it's registered:

aegis agent get python-coder

Run an execution:

aegis execute \
  --agent python-coder \
  --input '{"task": "Write a function that checks if a number is prime."}' \
  --watch

If the first iteration fails validation, the orchestrator injects the error into the next iteration's context and retries automatically (up to 10 times by default).


Common Issues

SymptomCauseFix
Container fails to startImage not found or not pullableVerify spec.runtime.isolation and registry credentials in node config
Tool call rejectedTool not declared in spec.toolsAdd the tool to spec.tools in the manifest
Validation always failsSchema path wrongDouble-check execution.validation.output.schema definition
Timeout on first iterationTask too complexIncrease security.resources.timeout or execution.validation.system.timeout_seconds
Network call rejectedDomain not in allowlistAdd the domain to spec.security.network.allowlist

Next Steps

On this page