Edge Fleet Operations
Run tools across many edge hosts — rolling restarts, ad-hoc commands, redeploys — read streamed per-node results, and cancel runs in flight.
Edge Fleet Operations
This guide is the operational counterpart to the fleet operations concept page. Where that page describes the model, this page walks through the actual commands you run, the output you read, and the patterns that work in practice.
By the end you'll know how to:
- Run an ad-hoc command across many hosts.
- Roll a configuration change in waves with halt-on-failure.
- Read streaming per-node results.
- Cancel a runaway run.
- Inspect run history.
Prerequisites
- At least two enrolled edge hosts in your tenant. See edge host setup.
- Tags or groups defined for the targeting you want. See tag and group management.
- A user with permission to call the system-tier
aegis.edge.fleet.*MCP tools (operator or tenant-admin).
The shape of a fleet run command
aegis edge fleet run \
--target <expr> \
--tool <tool-name> [--arg key=value]... \
[--mode parallel|sequential|rolling=N] \
[--max-concurrency N] \
[--on-error fail-fast|continue|stop-after=N] \
[--require-min N] \
[--deadline 60s]Per-node results stream live; when every per-node call has terminated, a final summary is printed.
<expr> — target shorthand
| Form | Meaning |
|---|---|
@<node-id> | Single node. |
group:<name> | Saved group. |
tags=a,b labels=k=v tools=docker | Ad-hoc selector. |
all | Every Connected edge of your tenant. |
Recipe 1: ad-hoc parallel command
You want to know which kernel every Linux host in your fleet is running. Read-only, idempotent — parallel and continue-on-error are the right defaults.
aegis edge fleet run \
--target tags=linux \
--tool cmd.run --arg cmd="uname -r" \
--mode parallel \
--on-error continue \
--deadline 10sStreamed output (each line tagged with the originating node):
[n-7a3b2f workstation-east] 6.8.0-31-generic
[n-1c8d4e workstation-west] 6.8.0-31-generic
[n-9f2a31 db-mirror-1] 6.5.0-15-generic
[n-4e7c80 db-mirror-2] 6.5.0-15-generic
[n-3b1d9f bastion] 6.1.0-18-amd64
✔ Fleet run a3f4...c8d2 complete
ok=5 err=0 timed_out=0Recipe 2: rolling restart with halt-on-first-failure
You want to restart nginx across the web tier, in waves of five, and stop immediately if any host reports a failure.
aegis edge fleet run \
--target group:web-tier \
--tool service.restart --arg name=nginx \
--mode rolling=5 \
--on-error stop-after=1 \
--deadline 30sStreamed output:
[wave 1] starting 5 of 12 nodes
[n-7a3b2f web-east-1] restarting nginx... ok (exit=0, 1.2s)
[n-1c8d4e web-east-2] restarting nginx... ok (exit=0, 1.4s)
[n-9f2a31 web-east-3] restarting nginx... ok (exit=0, 1.1s)
[n-4e7c80 web-east-4] restarting nginx... ok (exit=0, 1.5s)
[n-3b1d9f web-east-5] restarting nginx... ok (exit=0, 1.3s)
[wave 2] starting 5 of 12 nodes
[n-2a8c14 web-west-1] restarting nginx... ok (exit=0, 1.2s)
[n-5e1f93 web-west-2] restarting nginx... ERR (exit=3, 0.4s) "service file not found"
✖ stop-after threshold reached (1/1); cancelling in-flight, halting waves
✔ Fleet run b8e1...d4a7 halted
ok=6 err=1 cancelled=2 not_started=3The halt reason and per-node breakdown are explicit. The cancelled=2 accounts for in-flight calls in wave 2 that were cancelled when the threshold tripped.
Recipe 3: refuse to dispatch unless N hosts match
For safety-critical operations you may want a hard floor: "if fewer than 3 hosts match, don't run at all."
aegis edge fleet run \
--target tags=db,prod \
--tool cmd.run --arg cmd="systemctl status postgres" \
--require-min 3If only 2 hosts match, the dispatch is refused upfront and no per-node call is made:
✖ require-min not satisfied: matched 2, required 3
Resolved nodes:
n-9f2a31 db-mirror-1 tags=[prod,db] Connected
n-4e7c80 db-mirror-2 tags=[prod,db] ConnectedRecipe 4: preview before destructive fan-out
Before running anything destructive, verify the resolved target set:
aegis edge fleet preview --target tags=prodResolved 8 nodes (skipped: 1)
✓ n-7a3b2f web-east-1 linux/x86_64 Connected tags=[prod,web]
✓ n-1c8d4e web-east-2 linux/x86_64 Connected tags=[prod,web]
...
⊗ n-9f2a31 db-mirror-1 linux/x86_64 Disconnected tags=[prod,db]Disconnected hosts are listed under skipped with their reason — they're visible to the operator but won't receive the call.
The preview is also available in Zaru's fleet launcher modal as the Selector Preview Panel (it counts hosts as you build the selector) and as the system-tier MCP tool aegis.edge.fleet.list.
Recipe 5: cancel a runaway run
If a fleet run is taking too long or you realize it's misconfigured, cancel it by fleet_command_id:
aegis edge fleet cancel a3f4...c8d2The dispatcher broadcasts Cancel to every in-flight per-node command. Any wrapped tool that respects context cancellation halts; native external processes get a SIGTERM. Already-completed nodes are unaffected.
The same operation is exposed as the system-tier MCP tool aegis.edge.fleet.cancel and as a Cancel button in Zaru's live run view.
Recipe 6: redeploy a binary in waves
You've built a new binary, copied it to a known location on every host, and want to swap it in:
# Wave 1: validate the new binary on a smoke-test host.
aegis edge fleet run \
--target @n-smoke-test-host \
--tool cmd.run --arg cmd="/opt/myapp/bin/new --version" \
--deadline 5s
# Wave 2: roll across the fleet, 3 at a time, halt on first failure.
aegis edge fleet run \
--target group:myapp-fleet \
--tool myapp.swap --arg version=2.4.0 \
--mode rolling=3 \
--on-error stop-after=1 \
--deadline 30sTwo-phase rollouts (canary → fleet) become a habit, not a special case.
Reading the live run view in Zaru
When a fleet run starts in Zaru, you get a per-node grid:
┌─────────────────────────────────────────────────────────────┐
│ Fleet Run a3f4...c8d2 │
│ Tool: service.restart Args: name=nginx │
│ Mode: rolling=5 On error: stop-after=1 │
│ Status: running [Cancel] │
├─────────────────────────────────────────────────────────────┤
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ web-east-1 │ │ web-east-2 │ │ web-east-3 │ │
│ │ ✔ ok │ │ ✔ ok │ │ ✔ ok │ │
│ │ exit=0 1.2s │ │ exit=0 1.4s │ │ exit=0 1.1s │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ web-east-4 │ │ web-east-5 │ │
│ │ ⏳ running │ │ ⏳ running │ │
│ │ ┃ stdout... │ │ ┃ stdout... │ │
│ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────┘Each cell shows status, exit code, runtime, and a tail of stdout/stderr. Click a cell to expand its full output stream.
Inspecting run history
aegis edge fleet runs --output tableID STARTED TOOL TARGET MODE STATUS ok/err/skipped
a3f4...c8 2026-04-28 14:32:11Z service.restart group:web-tier rolling=5 halted 6/1/3
b8e1...d4 2026-04-28 11:08:43Z cmd.run tags=linux parallel complete 12/0/0In Zaru, Vault → Edge Hosts → Fleet Runs lists every run with the same fields and links into the live (or archived) per-node view.
Anti-patterns
| ❌ Don't | ✅ Do |
|---|---|
--target all --on-error continue for state-mutating tools | Use a more specific target, prefer --on-error fail-fast or --on-error stop-after=N. |
Skip --require-min for safety-critical operations | Set a floor that matches your contract. |
| Run destructive fan-outs without preview | aegis edge fleet preview first; check the resolved set. |
| Long deadlines on rolling deploys | Short per-target deadlines force fast failures and tighter waves. |
| Re-using a single tag for unrelated meanings | Pick tag axes — see tag conventions. |
What's next
- Fleet Operations (concept) — the full model behind these commands.
- Edge Tag and Group Management — for richer targeting.
- Edge Key Rotation — fleet-wide key rotation as a special case of fleet operations.
- Edge CLI Reference — every flag.
- Edge REST API —
/v1/edge/fleet/*endpoints if you're driving fleet runs programmatically.