This is a deep integration manual for implementing MachineID into Swarm-style agent orchestration. It is intentionally extensive so it can serve as a single reference for: (1) designing enforceable validation boundaries in multi-agent handoff flows, (2) modeling devices correctly across execution surfaces, and (3) applying remote stop control across tool-heavy and side-effect-heavy runs.
- Hard gating at run entry, tool execution, and handoff boundaries
- Device identity schemes that scale across agents and replicas
- Remote stop: revoke/restore, bulk controls, and org-wide disable
- Operational semantics: fail-closed, short timeouts, predictable stop points
- You want enforcement, not “best effort”
- You will define explicit stop points in the runtime
- You want authority to live outside the process
Swarm is commonly used as a learning/prototyping reference for agent handoffs and tool calling. The enforcement patterns here generalize to production agent runtimes that share the same primitives: run entry, tool execution, and handoffs.
Everything reduces to one invariant:
Register. Validate. Work.
If validation fails, work does not begin.
In Swarm, “work begins” at recognizable boundaries: starting a run, executing a tool function, performing a side effect, and transferring control during a handoff.
The fastest way to verify the control plane is to add a single hard gate to your runner, then revoke that device in the dashboard and confirm execution stops on the next validate.
- Generate a free org key (supports up to 3 devices): machineid.io
- Confirm you can revoke/restore from Dashboard
- Use deterministic device IDs (readable and stable)
Swarm-style systems center on a small set of primitives:
- Agents that produce messages and call tools
- Tools (functions) where external requests and side effects occur
- Handoffs where control transfers between agents
- Loops / re-entry where execution continues across steps
- Run entry: immediately before the orchestration run starts
- Tool execution: immediately before each tool function is executed
- Handoff boundary: immediately before transferring control to a different agent
- Side-effect boundary: immediately before irreversible actions (writes, sends, purchases)
- Loop re-entry: at the top of high-cost cycles (fan-out, recursion, repeated planning)
Handoffs are where responsibility moves from one agent to another. Treat that transfer as an enforcement point.
- Validate immediately before performing the handoff
- If denied, do not transfer; stop the run (fail closed)
- If you model one device per agent role, validate using the target agent’s device ID before the transfer
Tools are where costs and external effects begin. If you add only one in-run boundary beyond run entry, make it a tool-call gate.
- Validate before every tool call
- Add an additional validate immediately before high-risk tools (payments, email, writes)
- Fail closed on timeout/network errors
Many Swarm-style runs iterate: plan → call tools → update state → handoff → continue. If you validate only once at startup, a run can continue deep into a loop even after revoke/disable.
- Validate at the start of each step/iteration
- Validate before every tool call
- Validate before every side effect
Streaming improves responsiveness. It is not an enforcement boundary. A streamed run can still incur tool costs and side effects. Put validation boundaries where costs and effects occur — tool execution, handoffs, and side effects — not where output is displayed.
The Python SDK is the simplest integration surface: machineid-io/python-sdk.
pip install machineid-ioimport os
from machineid import MachineID
m = MachineID.from_env()
device_id = os.getenv("MACHINEID_DEVICE_ID", "swarm:dev:runner:01")
m.register(device_id)
decision = m.validate(device_id)
if not decision["allowed"]:
print("Execution denied:", decision.get("code"), decision.get("request_id"))
raise SystemExit(1)register(device_id),validate(device_id)list_devices(),usage()revoke(device_id),unrevoke(device_id)(alias:restore)remove(device_id)
allowed:false as the stop condition.
If you want minimal dependencies, call the canonical endpoints directly.
Send your org key via x-org-key and a deterministic deviceId.
POST https://machineid.io/api/v1/devices/register
Headers:
x-org-key: org_...
Body:
{"deviceId":"swarm:prod:runner:01"}POST https://machineid.io/api/v1/devices/validate
Headers:
x-org-key: org_...
Body:
{"deviceId":"swarm:prod:runner:01"}Wrappers let you add enforcement without rewriting your orchestration logic. The goal is consistent: validate immediately before the boundary where work begins or side effects commit.
def must_be_allowed(m, device_id, boundary):
d = m.validate(device_id)
if not d["allowed"]:
print(f"Denied at {boundary}:", d.get("code"), d.get("request_id"))
raise SystemExit(1)
return ddef gated_tool_call(tool_fn, *args, **kwargs):
must_be_allowed(m, device_id, "before_tool_call")
return tool_fn(*args, **kwargs)def gated_handoff(next_agent_name):
must_be_allowed(m, device_id, f"before_handoff:{next_agent_name}")
# perform handoff heredef commit_side_effect():
must_be_allowed(m, device_id, "before_side_effect")
perform_irreversible_action()Device IDs should be stable enough to audit, but specific enough to revoke. A practical pattern:
swarm:{env}:{role}:{instance}Examples that map cleanly to Swarm execution surfaces:
swarm:dev:runner:01— your orchestrator processswarm:prod:agent-planner:01— planner role (if you model per-agent roles)swarm:prod:tool-runner:03— tool-heavy surfaceswarm:prod:event-consumer:06— event-triggered runsswarm:prod:cron-nightly:01— scheduled run identity
- Do not embed secrets in device IDs (IDs are identifiers, not credentials)
- Avoid “one device for the whole fleet” unless you intentionally want coarse control
- If you want surgical stop control, model devices at the boundary you want to stop (tool-runner and side-effect surfaces)
Treat validation like a safety-critical call. Use short timeouts and fail closed. If permission cannot be confirmed, work should not proceed.
- Client timeout: short (for example, 1–3 seconds)
- Timeout/network failure: treat as
allowed:false - Stop the run/worker loop and surface it via logs/alerts
The free tier is enough to prove control end-to-end. The objective is to confirm that revoke/disable stops execution at predictable checkpoints.
swarm:dev:runner:01— orchestrator run entryswarm:dev:tool-runner:01— tool-heavy surface (best stop point)swarm:dev:cron-nightly:01— scheduled execution identity
- Run a workflow that executes at least one tool call
- Revoke
swarm:dev:tool-runner:01in the Dashboard - Observe: stop occurs at the next tool-call validate boundary
This tier supports multiple concurrent runners, consumers, and tool-heavy surfaces while keeping identity manageable.
swarm:prod:runner:01…:08(8 runners)swarm:prod:tool-runner:01…:08(8 tool-heavy surfaces)swarm:prod:event-consumer:01…:06(6 triggers)swarm:prod:cron-nightly:01…:03(3 schedules)
- Run entry: validate
- Before every tool call: validate
- Before every handoff: validate
- Before irreversible side effects: validate
At this scale, execution surfaces multiply under load: autoscaling, event storms, retries, and fan-out workflows. The dominant requirement is consistent enforcement across replicas and over time.
- Autoscaling runners / consumers
- Per-workflow or per-tenant identities
- Tool fan-out (one decision triggers many external calls)
- Long-running loops that execute tools repeatedly
Near the upper end of standard caps, the dominant failure mode is systemic multiplication: retries, recursive workflows, and many simultaneous execution surfaces.
- Prefer per-replica identities (avoid “one device for the whole fleet”)
- Validate at boundaries that occur frequently during real work (tools + side effects + handoffs)
- Avoid fallback authority paths (no degraded enforcement)
- Keep device IDs deterministic and auditable
If you need device limits beyond standard tiers, MachineID supports custom device limits. Your implementation pattern does not change: identity per execution surface and explicit validation boundaries.
MachineID provides a console at machineid.io/dashboard. The console exists outside your runtime so control does not depend on the process cooperating.
- Revoke / restore devices (including bulk)
- Remove devices
- Register devices
- Rotate keys
- Org-wide disable (hard stop across devices)
In addition to revoking individual devices, MachineID supports an org-wide disable control. This changes validate outcomes across the org (allowed becomes false).
- Org-wide disable does not change device revoked/restored state
- It takes effect at the next validation boundary you defined
- To make it operationally useful, validate during runs (tools/handoffs/side effects)
Remote controls become effective at the next validate. Stop latency is determined by your boundary placement: if you only validate at run entry, revocation will not stop a long run already deep in tool loops.
- Validate before tool calls
- Validate before side effects
- Validate before handoffs
- Validate at loop re-entry points
These patterns defeat the purpose of external enforcement:
- Proceed anyway on validation timeout or error
- Continue for a fixed grace window while enforcement is unavailable
- Fallback to internal flags as an alternate authority path
- Validate only at run entry for long-running, tool-heavy runs
- This almost always means validate boundaries are too far apart
- Add validate before tool calls and before side effects
- Add validate before handoffs (control transfer)
- If one tool call runs for minutes, gate it before it begins
- Check
codeandrequest_idfor the decision - Confirm the device is not revoked in the dashboard
- Confirm org-wide disable is not enabled
- Confirm you have not exceeded your device cap (new unique IDs)
- Use short client timeouts (1–3s) and fail closed
- Treat inability to validate as not allowed and stop
- Surface denial via logs and stop the run
The prompts below are designed to produce practical integration plans with minimal guesswork. Replace bracketed placeholders and paste into your LLM of choice.
I have a Python Swarm-style orchestrator (agents + tools + handoffs). I want hard enforcement using MachineID.
Context:
- My org key: [PASTE ORG KEY]
- Base URL: https://machineid.io
- Device ID pattern: swarm:{env}:{role}:{instance}
- Fail-closed policy, short timeout (1–3s)
- Validation boundaries required:
1) Run entry (register + validate, fail closed)
2) Before every tool execution (highest leverage)
3) Before every handoff (control transfer)
4) Before irreversible side effects (writes, sends, payments)
5) At loop re-entry points (high-cost cycles)
Please provide:
1) Exact files/locations to change
2) Copy/paste code blocks using the MachineID Python SDK (pip install machineid-io)
3) A test plan:
- revoke a device from dashboard
- restore it
- use org-wide disable
- verify stops occur at the next validate boundaryHelp me model MachineID devices for my Swarm-style system.
Inputs:
- Agents: [list agent roles]
- Tools: [list tools, note which ones are high-risk side effects]
- Triggers: [API requests / cron / queue / event consumer]
- Expected scale: [3 / 25 / 250 / 1000]
- I need readable device IDs and surgical revoke control.
Output:
- A proposed device ID scheme
- A concrete list of device IDs for the target tier
- Where to validate (run entry / tools / handoffs / side effects / loop re-entry)
- A minimal runbook for revoke + org-wide disableI have a Swarm-style run that loops and can call tools repeatedly.
Goal:
- Add MachineID validate boundaries inside the loop so I can stop it remotely.
- Fail closed on validation timeout or error.
Please provide:
- Exactly where to place validation calls
- A wrapper pattern that is hard to forget
- A test plan using dashboard revoke and org-wide disable- Stable device identity per execution surface
- Run-entry gating (register + validate, fail closed)
- At least one in-run stop point (tool-call or side-effect boundary)
- Handoff gating (control transfer boundary)
- Short timeout and consistent failure policy
- Denials logged as operational events (include request_id)
- A runbook to revoke/restore and use org-wide disable
- Python SDK: github.com/machineid-io/python-sdk
- MachineID GitHub org: github.com/machineid-io
- Dashboard: machineid.io/dashboard
- Core enforcement guidance: Implementation Guide and Operational Guarantees