Docker Deployment
Docker Engine setup, agent container lifecycle, bollard integration, NFS volume mounting, and systemd service configuration.
Docker Deployment
Docker is suitable for local development and single-node setups. For production platform deployment using Podman pods, see Podman Deployment. Both Docker and Podman are supported as agent container runtimes.
Docker is a supported container runtime for AEGIS agent execution. The orchestrator communicates with Docker via the bollard Rust library (Docker API over the Unix socket). This page covers Docker Engine setup for local development, agent container lifecycle, and the NFS volume mounting model.
Prerequisites
- Docker Engine 24.0+ (or Podman 4.0+ with Docker-compatible socket)
- The AEGIS daemon process has read/write access to
/var/run/docker.sock(or the Podman socket) - Agent container images are accessible from the daemon host (either locally present or pullable)
- NFS traffic (TCP port 2049) is routable between agent containers and the daemon host
Docker Socket Access
The AEGIS daemon must be able to connect to the Docker socket. In production, run the daemon as a user in the docker group, or configure a dedicated socket permission:
# Add the aegis service user to the docker group
sudo usermod -aG docker aegis
# Verify
sudo -u aegis docker psNever run the AEGIS daemon as root.
Container Lifecycle
When an execution starts, the orchestrator:
- Pulls the image (respecting
spec.runtime.image_pull_policyfrom the agent manifest — see Container Registry & Image Management). - Creates the container with:
- CPU quota and memory limit from
spec.resources - NFS volume mounts (described below)
- Network configuration from
spec.security.network_policy - Environment variables from
spec.environment - The container UID/GID stored in the
Executionmetadata for UID/GID squashing
- CPU quota and memory limit from
- Starts the container —
bootstrap.pybegins executing. - Monitors the container for the duration of the iteration.
- Stops and removes the container after the iteration completes or times out.
Containers are removed immediately after each iteration. A fresh container is created for each iteration in the 100monkeys loop.
Container Cleanup (Defense-in-Depth)
Agent containers are cleaned up through three independent mechanisms:
| Layer | Trigger | Mechanism |
|---|---|---|
| Explicit termination | Normal exit paths (success, failure, timeout, cancellation) | runtime.terminate() → docker rm -f |
| RAII guard | Panic or unexpected error between spawn() and terminate() | ContainerGuard Drop impl spawns async cleanup task |
| Background reaper | Orphaned containers from process crashes or Docker API failures | Daemon task runs every 5 min, cross-references containers against DB |
The reaper identifies orphans by listing all containers with the aegis.managed=true label and checking their aegis.execution_id against the execution repository. Containers are reaped when:
- The execution record is missing
- The execution status is not
Running - The container is not running but the execution is still marked
Running
Containers with aegis.keep_container_on_failure=true are skipped by the reaper, allowing manual debugging via docker exec.
Resource Limits
Manifest resource limits are translated to Docker container constraints:
spec:
resources:
cpu_quota: 1.0 # → Docker --cpus=1.0
memory_bytes: 1073741824 # → Docker --memory=1073741824
timeout_secs: 300timeout_secs is enforced by the ExecutionSupervisor. If the inner loop has not produced a final response within timeout_secs, the container is force-killed and the iteration is failed.
NFS Volume Mounting
Docker deployments use the NFS transport for agent container volume mounts. Because the Docker daemon runs as root, the kernel NFS client is available without any special configuration. The FUSE daemon is not required for Docker deployments -- it is designed for rootless Podman where NFS mounts are unavailable. See Podman Deployment for the FUSE-based volume mounting model.
Agent containers mount their volumes via the kernel NFS client to the orchestrator's NFS server gateway (port 2049):
# Example Docker API mount configuration produced by AEGIS for a volume named "workspace":
{
"Target": "/workspace",
"Type": "volume",
"VolumeOptions": {
"DriverConfig": {
"Name": "local",
"Options": {
"type": "nfs",
"o": "addr=<orchestrator-host>,nfsvers=3,proto=tcp,soft,timeo=10,nolock",
"device": ":/<tenant_id>/<volume_id>"
}
}
}
}The agent container does not require CAP_SYS_ADMIN or any elevated capabilities for NFS mounts. The kernel NFS client is used (as opposed to FUSE), which runs in user space without special privileges.
Network Reachability of NFS
Agent containers must be able to reach the orchestrator host on TCP port 2049. In single-host deployments using the Docker bridge network, the orchestrator host is typically reachable at 172.17.0.1 (Docker default bridge gateway).
Configure the NFS listen address in aegis-config.yaml:
storage:
nfs_listen_addr: "0.0.0.0:2049"In multi-host deployments, use the orchestrator host's external IP or hostname.
Network Isolation
Each agent container is placed on a user-defined Docker bridge network. Network egress is controlled by the manifest network_policy:
spec:
security:
network_policy:
mode: allow
allowlist:
- pypi.org
- api.github.comThe AEGIS daemon enforces network policy at the SEAL layer (per tool call), not via Docker network rules. In high-security environments, you may additionally configure Docker network rules or firewall rules to enforce the allowlist at the kernel level.
systemd Service
For production deployments, run the AEGIS daemon as a systemd service:
# /etc/systemd/system/aegis.service
[Unit]
Description=AEGIS Orchestrator Daemon
After=network-online.target docker.service # or podman.socket for Podman
Requires=docker.service # or podman.socket for Podman
[Service]
User=aegis
Group=docker
WorkingDirectory=/opt/aegis
ExecStart=/usr/local/bin/aegis --daemon --config /etc/aegis/config.yaml
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535
# Environment variables for secrets (avoid plaintext in config)
EnvironmentFile=/etc/aegis/env
[Install]
WantedBy=multi-user.target# /etc/aegis/env (chmod 600)
DATABASE_URL=postgresql://aegis:password@localhost:5432/aegis
OPENAI_API_KEY=sk-...
OPENBAO_ROLE_ID=...
OPENBAO_SECRET_ID=...# Enable and start
sudo systemctl enable aegis
sudo systemctl start aegis
# Check status
sudo systemctl status aegis
# Follow logs
sudo journalctl -u aegis -fHealth Checks
The AEGIS daemon exposes health endpoints:
# Liveness (daemon process alive)
curl http://localhost:8080/health/live
# Readiness (daemon ready to accept requests; all dependencies connected)
curl http://localhost:8080/health/readyUse these in load balancer health check configuration or container orchestration readiness probes.