Implementation Guide

Practical patterns for placing enforcement boundaries, handling failures, and deploying in production.

Companion to Operational Guarantees
What this is

This page explains how to implement runtime enforcement in real systems: where to place validation boundaries, how frequently to validate, and how to handle failures without turning enforcement into best-effort policy. It is operational guidance that assumes the guarantees are already understood.

Design principle: enforcement over cooperation
Engineering goal: boundaries that are explicit and testable
Core invariant

Everything reduces to a single invariant:

Register. Validate. Work.

If validation fails, work does not begin.

Your only job as an implementer is to choose and enforce where validation boundaries exist. If the boundary is vague, enforcement becomes vague.

Where to validate

Validation belongs at execution boundaries—points where work begins or commits side effects. Most systems already have these boundaries; they are simply not treated as enforcement points.

  • Startup boundary: before an instance begins processing work
  • Task boundary: before each unit of work (job, message, task, tool call)
  • Side-effect boundary: before committing external actions (writes, payments, emails)
If your system can multiply execution surfaces (replicas, workers, consumers), startup validation is non-negotiable. If your system can trigger real-world effects, side-effect boundaries are where you get the most leverage.
Boundary patterns

Pattern A: Startup gate (required)

Validate once at startup. If allowed is false, do not start the loop. Exit immediately.

register(device_id)
val = validate(device_id)
if not val.allowed:
    exit(1)

start_work_loop()

Pattern B: Per-unit-of-work gate (recommended)

Validate before each job/message/task. This makes revocation effective at predictable points.

while True:
    job = dequeue()
    val = validate(device_id)
    if not val.allowed:
        exit(1)

    process(job)

Pattern C: Side-effect gate (high leverage)

Validate immediately before irreversible actions: writes, transfers, external API calls, outbound messages.

val = validate(device_id)
if not val.allowed:
    exit(1)

commit_side_effect()
Reality: You can combine patterns. Most production systems do Startup + Per-unit-of-work, and add Side-effect gates for the highest-risk actions.
Long-running loops

MachineID does not introspect internal loops. If a process runs for hours, you must define enforcement boundaries inside the loop.

Good boundaries are:

  • Before each iteration that triggers a tool call
  • Before each external request that can incur cost
  • Before each write or irreversible side effect
  • Before entering a high-cost sub-loop (batch runs, fan-out, recursion)
Rule of thumb: if an action can cost money or change state, put a validation boundary immediately before it.
Timeouts

Treat validation like a safety-critical call: it must be fast, and it must fail safely.

Recommended default: short client timeout (for example, 1–3 seconds) and fail closed (do not proceed when validation cannot be confirmed).

If your runtime cannot tolerate fail-closed for certain workloads, you are describing a different system: one that accepts best-effort enforcement. That is explicitly outside the guarantees.

Network failures

Decide your failure policy up front. Do not allow “it depends” behavior per team, per service, or per environment. Enforcement must be consistent.

Recommended policy: fail closed

  • If validate fails (timeout/network): treat as allowed:false
  • Exit the process or stop the worker loop
  • Surface the failure via logs/alerts
This policy makes your system controllable under uncertainty. If you cannot confirm permission, you do not execute.
No degraded mode

The guarantee is binary. Avoid patterns like:

  • “Proceed anyway but log a warning”
  • “Continue for 10 minutes until checks recover”
  • “Fallback to internal flags when validation is down”

Those patterns create a second authority path inside the runtime. That is precisely what an external control plane avoids.

Identity model

MachineID enforces permission on an identity (a “device”) representing an execution surface. This identity should map to a specific runtime instance or logical worker identity—not a whole service.

  • Good: one agent instance = one device
  • Good: one worker replica = one device
  • Risky: one entire cluster = one device (loses surgical control)
Device ID strategy

Device IDs should be stable enough to audit, but specific enough to revoke. A common pattern is:

{service}:{env}:{role}:{instance}

Examples:

  • agent:prod:planner:01
  • worker:prod:queue-consumer:07
  • job:prod:nightly-reindex:01
Important: Do not embed secrets in device IDs. Treat IDs as identifiers, not credentials.
Rate and frequency

Choose validation frequency based on risk:

  • Low risk: validate at startup, then at task boundaries
  • Medium risk: validate at startup and per unit of work
  • High risk: validate at startup + per unit of work + before side effects
If you are running an agent that can spend money, write production data, or trigger external actions, validate at every side-effect boundary.
Logging and audit

Enforcement without observability is operationally painful. At minimum, log:

  • Device ID
  • Validation result (allowed)
  • Reason/error (when denied)
  • Boundary type (startup / task / side-effect)

Keep logs structured. Treat enforcement denials as first-class operational events.

Example: agents

Validate at startup, then before each tool call or task step. If denied, exit immediately.

register(agent_id)

if not validate(agent_id).allowed:
    exit(1)

for step in plan:
    if not validate(agent_id).allowed:
        exit(1)

    run_step(step)
Example: workers

Validate before pulling work. If denied, stop consuming.

register(worker_id)

if not validate(worker_id).allowed:
    exit(1)

while True:
    if not validate(worker_id).allowed:
        exit(1)

    job = dequeue()
    process(job)
Example: scheduled jobs

Validate at job start. If denied, exit before doing any work.

register(job_id)

if not validate(job_id).allowed:
    exit(1)

run_job()
Example: event consumers

Validate before consuming or before handling each message, depending on risk and throughput.

register(consumer_id)

if not validate(consumer_id).allowed:
    exit(1)

while True:
    msg = read_message()

    if not validate(consumer_id).allowed:
        exit(1)

    handle(msg)
Example: webhooks

Validate before triggering any downstream side effects.

register(handler_id)

if not validate(handler_id).allowed:
    return 403

commit_side_effect()
Checklist
  • Every execution surface has a stable device identity
  • Startup validation is enforced and fails closed
  • Boundaries are explicit (task / side-effect)
  • Revocation becomes effective at predictable checkpoints
  • Timeouts are short and consistent
  • Denials are logged and actionable
If your system can multiply execution, and you cannot reliably stop it, you do not have control.