A single-agent system has one trust boundary: user to model. A multi-agent system has many: user to orchestrator, orchestrator to subagents, subagents to tools, agents to shared state. Each boundary is a potential attack surface and a potential failure mode. This playbook addresses each boundary systematically.
Boundary 1: User to Orchestrator
The orchestrator is the entry point for user input. It must enforce all the same controls as a single-agent system — input validation, rate limiting, authentication, content policy — before decomposing the task and dispatching to subagents.
The orchestrator should be the only component that ever processes raw user input. Subagents should receive pre-validated, sanitized inputs from the orchestrator — never raw user strings.
Boundary 2: Orchestrator to Subagents
- ▸Authenticate agent-to-agent calls: subagents should verify that calls are coming from the authorized orchestrator, not from a compromised or spoofed agent
- ▸Scope task decomposition: each subagent should receive only the information and permissions required for its specific subtask — not the full task context
- ▸Enforce output schemas: subagent outputs should conform to a strict schema that the orchestrator validates before using — free-form text from a subagent should never be executed as code or instructions
- ▸Log every orchestrator-to-subagent call with full arguments — this is essential for reconstructing attack chains in post-incident analysis
Boundary 3: Agents to Tools
- ▸Separate tool permissions by agent: the research agent should not have the write permissions needed by the action agent
- ▸Validate tool arguments before dispatch: allow-list argument values for high-risk tools (never pass agent-generated strings directly as shell commands)
- ▸Implement tool call rate limits per agent — prevents a compromised agent from triggering tool calls at abusive rates
- ▸Monitor for tool call anomalies: unusual tool combinations, unexpected argument patterns, or high-frequency calls are all signals worth alerting on
Boundary 4: Agents to Shared State
When multiple agents share a state store, a compromised agent can corrupt state that other agents depend on. Design shared state access around explicit ownership: each piece of state has a designated owner agent, and other agents can only read (not write) state they do not own.
Monitoring Multi-Agent Systems
- ▸End-to-end request tracing: trace every user request through all agents and tools with a shared correlation ID
- ▸Anomaly detection on agent output distributions: a subagent that starts producing significantly different outputs is a signal worth investigating
- ▸Human review queues for high-stakes decisions: some decisions should always require human review before the action agent executes them
- ▸Rollback capability: design for the ability to undo multi-agent actions — especially important for workflows that modify shared state
