Configuring Storage Volumes
Declaring ephemeral and persistent volumes, volume mounts, access modes, quotas, and managing volumes via CLI.
Configuring Storage Volumes
AEGIS provides agents with filesystem storage via Volumes. Volumes are first-class domain entities managed by the orchestrator and backed by SeaweedFS. Agents access their volumes as a standard POSIX filesystem — the NFS mount is transparent to agent code.
All agent volume mount paths must be rooted at /workspace. Valid examples: /workspace, /workspace/datasets, /workspace/cache.
Volume Types
| Storage Class | Description | TTL |
|---|---|---|
| Ephemeral | Temporary workspace; auto-deleted after execution or when TTL expires. | Required; e.g., 1h, 30m, 2h |
| Persistent | Survives across executions; must be explicitly deleted. | Not applicable |
Use ephemeral volumes for scratch space, build artifacts, and intermediate results that do not need to outlive the execution.
Use persistent volumes for data that must be shared across multiple executions, stored long-term, or readable by other agents.
Declaring Volumes in the Manifest
spec:
volumes:
# Ephemeral workspace scratch volume — deleted after 1 hour
- name: workspace
mount_path: /workspace
access_mode: read-write
storage_class: ephemeral
ttl_hours: 1
# Persistent output volume — survives across executions
- name: output-store
mount_path: /workspace/output
access_mode: read-write
storage_class: persistent
# Read-only access to a shared reference dataset
- name: reference-data
mount_path: /workspace/reference
access_mode: read-only
storage_class: persistent
source:
volume_id: "vol-a1b2c3d4-..." # reference an existing volume by IDVolume Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | ✓ | Local identifier used to reference this mount within the manifest. |
mount_path | string | ✓ | Absolute path inside the container, rooted at /workspace. |
access_mode | read-write | read-only | ✓ | Access mode enforced by the storage gateway (AegisFSAL). |
storage_class | ephemeral | persistent | ✓ | Lifetime of the volume. |
ttl_hours | integer | Required for ephemeral | Hours until auto-deletion (e.g., 1, 24, 48). |
size_limit | string | Maximum volume size as a Kubernetes resource string (e.g., "500Mi", "5Gi"). Writes that exceed this emit a VolumeQuotaExceeded event and return ENOSPC. | |
source.volume_id | string | Required for persistent | Pin to a specific existing volume by UUID. Supports Handlebars: {{input.volume_id}}. |
mount_path values must be unique and non-overlapping except for the canonical root /workspace (for example, /workspace plus /workspace/datasets is valid).
How Volumes Are Mounted
When an execution starts, the orchestrator:
- Creates the declared volumes in SeaweedFS (or resolves existing ones via
volume_id). - Starts the NFS server gateway for each volume.
- Mounts the volumes into the agent container via the kernel NFS client (
nfsvers=3). - The agent sees the mount path as a standard filesystem directory.
No special code is required in bootstrap.py. The agent can use ordinary Python open(), os.path, shutil, etc. to access volume contents.
# Agents read and write volumes like any filesystem
with open("/workspace/solution.py", "w") as f:
f.write(code_content)
# Files written are immediately visible to the orchestrator
# for FSAL tool operations (fs.read, fs.write, etc.)Storage Gateway Security
The AEGIS storage gateway (AegisFSAL) intercepts every POSIX operation on the volume:
- Authorization: Verifies the requesting execution holds the volume's
FileHandle(encodingexecution_id + volume_id). - Path canonicalization: All paths are normalized server-side. Any
..component is rejected before reaching SeaweedFS. - Filesystem policy enforcement: Read/write path allowlists from
spec.security.filesystemare enforced per-operation. - Quota enforcement: Writes that would exceed
size_limitare blocked and emitVolumeQuotaExceeded. - Audit logging: Every operation (open, read, write, create, delete, readdir) is published as a
StorageEventdomain event to the event bus.
Quota Configuration
Set a maximum volume size in bytes:
volumes:
- name: workspace
mount_path: /workspace
access_mode: read-write
storage_class: ephemeral
ttl_hours: 2
size_limit: "5Gi"If the agent writes data that would exceed the quota, the write fails with ENOSPC inside the container and a VolumeQuotaExceeded event is emitted.
Phase 1 Constraints
Single-writer constraint: In Phase 1, a persistent volume with read-write access can only be mounted by one execution at a time. Attempting to mount an already-in-use read-write persistent volume will cause the second execution to fail at startup with a VolumeAlreadyMounted error.
File locking: NFS mounts use the nolock option. POSIX advisory locks (flock, fcntl) are not coordinated between agents. For multi-agent coordination on shared files, use the ResourceLock mechanism instead.
Volume Ownership
Every volume has an owner that determines its lifetime and access rules:
| Ownership | Created By | Lifetime |
|---|---|---|
| Execution | Orchestrator at execution start (one per spec.volumes entry) | Ephemeral volumes are deleted when the execution ends; persistent volumes survive and require manual deletion |
| WorkflowExecution | Orchestrator at workflow start (one per spec.storage entry) | Ephemeral volumes are deleted when the workflow execution completes; persistent volumes survive |
| Persistent | API | No automatic cleanup; must be explicitly deleted via volume management APIs |
When a volume is owned by an execution, only that execution can perform write operations through AegisFSAL. Other executions may mount the same persistent volume read-only by referencing it via source.volume_id — this does not change ownership.
TTL and Garbage Collection
The TTL clock starts at volume creation, not at execution start. If provisioning pauses between volume creation and container startup, those seconds count against the TTL.
The background garbage collector runs every gc_interval_minutes (default: 60 minutes, configurable in the node configuration under spec.storage.seaweedfs.gc_interval_minutes).
| Scenario | What happens |
|---|---|
| Execution completes normally and volume is ephemeral | Volume deleted immediately on execution teardown, not waiting for GC |
| Execution is cancelled mid-run | Volume marked expired; deleted on next GC pass |
| Orchestrator restarts unexpectedly | GC picks up all orphaned expired volumes on the next scheduled pass |
ttl_hours not set on an ephemeral volume | default_ttl_hours from node configuration applies (default: 24) |
Cleanup is a two-phase soft-delete: the orchestrator transitions the volume to Deleting, removes the SeaweedFS directory, then transitions to Deleted. If the SeaweedFS deletion fails, the volume stays in Deleting state and the next GC pass retries.
Managing Volumes
Volume lifecycle is managed by orchestrator APIs and execution/workflow manifests in the current release.
Dedicated aegis volume ... CLI commands are not exposed yet.
Volumes used by an active execution cannot be deleted. The delete command returns an error if the volume is currently mounted.
Pinning to an Existing Volume
To pass data between executions via a persistent volume:
# Get execution metadata and copy the referenced volume ID
EXEC_ID=<execution-id>
curl http://localhost:8088/v1/executions/${EXEC_ID}In the next agent manifest:
volumes:
- name: previous-output
mount_path: /workspace/previous
access_mode: read-only
storage_class: persistent
source:
volume_id: "vol-a1b2c3d4-..."The previous execution's output is now readable at /workspace/previous in the new agent container.
Passing Volumes Between Agents in a Workflow
In a workflow, the standard pattern is for a writer agent to populate a persistent volume and declare its volume_id on the blackboard, so that subsequent agents can mount it read-only.
Writer agent manifest (first workflow state):
spec:
volumes:
- name: output
mount_path: /workspace/output
access_mode: read-write
storage_class: persistentAfter the writer completes, a system state in the workflow writes the volume ID to the blackboard:
states:
- name: write-output-id
kind: system
action: blackboard.set
params:
key: output_volume_id
value: "{{executions.writer.volumes.output.id}}"Reader agent manifest (subsequent workflow state):
spec:
volumes:
- name: previous-output
mount_path: /workspace/input
access_mode: read-only
storage_class: persistent
source:
volume_id: "{{blackboard.output_volume_id}}"The reader agent sees the writer's files at /workspace/input. Because it mounts read-only, it is compatible with the single-writer constraint and can run concurrently with other readers on the same volume.
Inspecting Volume Artifacts
When seaweedfs.s3_endpoint is configured in the node configuration, all volume contents are also accessible via any S3-compatible client. This is useful for inspecting execution artifacts, debugging failed agents, or archiving outputs without running a new execution.
Volumes are stored at the following path in the SeaweedFS S3 namespace:
/<tenant_id>/<volume_id>/Using the AWS CLI:
# Retrieve the volume ID from a previous execution
TENANT_ID="00000000-0000-0000-0000-000000000001" # default single-tenant ID
VOL_ID="<volume-id-from-execution-metadata>"
# List files in the volume
aws s3 ls s3://${TENANT_ID}/${VOL_ID}/ \
--endpoint-url http://seaweedfs-s3:8333
# Download all artifacts locally
aws s3 cp s3://${TENANT_ID}/${VOL_ID}/ ./artifacts/ --recursive \
--endpoint-url http://seaweedfs-s3:8333The S3 gateway does not enforce manifest FilesystemPolicy rules. Restrict access to the S3 gateway endpoint at the network level in production environments — it should not be reachable from agent containers or public networks.
Troubleshooting
Permission denied (EACCES) on file operations
The agent's code is attempting to read or write a path that is not covered by the manifest's spec.security.filesystem allowlists, or the path contains a .. component.
Check the FilesystemPolicyViolation or PathTraversalBlocked events in the execution log to identify the exact path. Then update the manifest's read or write allowlist to include it:
spec:
security:
filesystem:
write:
- /workspace
- /tmp # add paths your agent needsNo space left on device (ENOSPC) during a write
The volume's size_limit quota has been reached. A VolumeQuotaExceeded event is emitted with the volume ID and the byte counts. Either increase size_limit or clean up files before writing more data.
Note that quota accounting tracks cumulative bytes written, not net storage used. Deleting files does not reduce the recorded quota usage in Phase 1.
VolumeAlreadyMounted at execution startup
A persistent volume declared with read-write access is already mounted by another running execution. Persistent read-write volumes can only be held by one execution at a time.
Options:
- Wait for the existing execution to complete before starting a new one that needs write access.
- Switch the new execution to
read-onlyaccess if it only needs to read the volume. - Use execution/workflow metadata APIs to identify which execution currently owns the volume.
Mount hangs or times out
NFS mounts use soft mode with a 1-second timeout and 2 retries. If the orchestrator NFS server is unreachable, the mount fails with ETIMEDOUT rather than hanging indefinitely. Check that the AEGIS orchestrator process is healthy and that port 2049 is reachable from the agent container's network.
For current CLI vs API storage operation coverage, see the CLI Capability Matrix.