The obvious design choice for an AI agent safety layer would be another AI. An LLM that reviews proposed actions, checks them against intent, flags anomalies. Smarter, more flexible, context-aware. We went the other way — and it wasn't a close call.
Your agent operates over a context window. Instructions, tool outputs, user requests, prior results — all of it accumulates. At 128k tokens, your agent has processed more text than most humans read in a week. And somewhere in that context, something drifted. A misinterpreted instruction. A hallucinated tool output. A user request that conflicts with something from 80k tokens ago.
The agent is still confident. That's the problem. LLMs don't have a confidence alarm that fires when their context gets noisy. They produce outputs that look exactly the same whether they're correct or badly wrong.
Now imagine your safety layer is also an LLM. It has its own context window. Its own drift. Its own hallucination surface. You've added a second probabilistic system to review a first probabilistic system and declared the result "safe."
You haven't added safety. You've added complexity and a second point of failure.
What you want from a safety layer is a guarantee: given the same input, it produces the same output. Every time. Forever. That's determinism — and it's the one property LLMs fundamentally cannot provide.
Gate's evaluation is stateless. No context window. No prior conversation. No accumulated state that can drift. A proposal comes in, Gate checks it against a YAML policy, and returns a result. The policy was written by a human when they were thinking clearly, not by an LLM at inference time under token pressure.
id: support-reply
destinations: [zendesk.reply, intercom.reply]
block_if_terms:
- "refund guaranteed"
- "legal action"
require_approval: false
auto_approve_under: 1
expiry_minutes: 30
This rule doesn't hallucinate. It doesn't forget it was written three days ago. It doesn't get confused by what happened 80k tokens back in the agent's session. It evaluates payloadContent.includes("refund guaranteed") and returns blocked or it doesn't. Same answer every time.
The obvious objection: YAML is rigid. An LLM could catch nuanced violations that term matching misses — synonyms, paraphrasing, context-dependent harm.
True. It could also catch things that aren't violations, fail to catch things that are, and behave differently on Tuesday than it did on Monday for reasons you can't audit. Flexibility and determinism are in tension. For a safety layer, we'd rather have a system that's predictably correct about the things it covers than one that's unpredictably flexible about everything.
If your policy needs to block "refund guaranteed," write that. If it needs to block synonyms, write those too. The policy is code. It's reviewable, versionable, and testable. An LLM's reasoning is none of those things.
The honest trade-off: Gate's term scan stops accidental violations and basic obfuscation. It doesn't stop a maximally adversarial agent using creative synonyms. For high-risk destinations, pair term scanning with require_approval: always — human review covers the long tail that rules can't anticipate.
Some decisions genuinely require judgment — semantic understanding, context, edge cases that no finite rule set anticipated. Gate handles this with a human approval step, not another AI layer. If the policy can't auto-approve, the proposal goes to a human reviewer in the dashboard.
This is a deliberate architectural choice. When the system doesn't know, it asks a person. Not a smarter model. A person — with full context, full accountability, and the ability to say "this is fine" or "this is not fine" in ways that get written to the audit trail permanently.
When Gate blocks a proposal, the reason is a string: "Payload contains blocked term: refund guaranteed". When it approves one automatically, the reason is traceable to a specific policy rule. When it requires human approval, the approver's name is in the audit trail.
Try getting that from an LLM reviewer. "Why did you approve this?" — "Based on my assessment of the semantic content and contextual appropriateness..." — is not an audit trail. It's a post-hoc rationalization.
You can't audit a vibe. You can audit a rule.
There's a spectrum of AI safety approaches: full autonomy on one end, full human review on the other. Most teams live somewhere in the middle — some things are fine to auto-approve, some things need eyes, and some things should never happen regardless.
Gate is infrastructure for that middle ground. It's not trying to be smarter than your agent. It's trying to be the one thing in your pipeline that doesn't reason — so that when something goes wrong, you know exactly where to look and exactly what happened.
YAML policies aren't a limitation. They're the point.