Engineering March 8, 2026 4 min read

847 Records Overwritten, No Audit Trail: An AI Agent Production Incident

This is the ai agent production incident pattern no one talks about until it happens to them. Monday morning. The sales team opens Salesforce. Account names are wrong. Phone numbers are truncated. A field called lead_score — the one the whole pipeline depends on — is blank across hundreds of accounts. To prevent AI agent mistakes like this, you need a checkpoint layer the enrichment agent never had.

The enrichment agent ran Friday night. It had a job: pull company data from an external API, fill in missing fields. It did the job. All 847 records. The API returned partial data. The agent wrote it anyway, no questions asked, overwriting clean values with garbage.

No approval step. No diff. No rollback. No audit trail.

That was the incident. The fix took three days. Some data was unrecoverable.

Why This Keeps Happening

AI agents are write-capable. That's the point. You give them tools — CRM updates, email sends, webhook calls, database mutations — and they use them. The problem isn't that they use write tools. The problem is there's nothing between "agent decides to write" and "write executes."

In every other engineering context, writes have checkpoints. Migrations have review. Deploys have CI gates. Code has pull requests. But when an AI agent writes to production, there's often nothing. Just: agent runs, data changes, done.

What's missing is a commit layer — a point where proposed writes are visible, policy-checked, and either approved or blocked before they touch anything real.

This is the gap that causes ai agent production incidents at companies that think they're moving fast. They are. They're also flying blind.

The architecture behind that commit layer — what it is, how it works — is covered in detail in Why AI Agents Need an AI Agent Commit Checkpoint. The short version: every write needs a submit intent → policy check → approve → execute cycle before it touches production.

Three Incident Patterns (and What Stops Them)

1. The CRM Overwrite

An enrichment agent is told to fill missing data fields. The upstream API is flaky — returns partial payloads. The agent doesn't know the difference between "no data" and "bad data." It writes what it gets.

847 records. Partial data. Overwrites clean values. No warning.

The fix is a bulk write threshold. If an agent proposes touching more than N records in a single pass, stop and ask. That's it. The agent isn't wrong — it's doing what you told it to do. You just never told it when to pause.

gates:
  - name: crm-enrichment-guard
    trigger: write
    resource: crm.contacts
    policies:
      - require_approval_over: 100

With this policy active, the agent proposes 847 writes. Gate intercepts. A human sees the diff before anything touches the database. Friday night, nothing bad happens.

2. The Duplicate Call

A lead comes in. The agent fires a webhook to trigger an outbound call. The API times out. The agent retries — standard behavior. The webhook fires again. The customer's phone rings twice in 30 seconds.

Same lead. Two calls. Different reps. Customer is confused and annoyed.

The agent didn't malfunction. Retry logic is correct. But no one told the system that this specific action — triggering an outbound call for this specific lead — already happened.

gates:
  - name: outbound-call-dedup
    trigger: webhook
    resource: leads.outbound_call
    policies:
      - idempotency_key: # set per-intent to block duplicates
          window_seconds: 300
          key: lead_id

The policy hashes the intent by lead_id. A second call with the same idempotency key within 5 minutes gets blocked. The customer gets one call. The retry logic keeps working. Everyone's fine.

3. The Template That Wasn't Ready

A campaign agent assembles an email batch. The template has a variable: {{customer_name}}. Somewhere in the pipeline, the variable substitution fails — wrong key, encoding issue, doesn't matter. The agent sends anyway.

Hundreds of customers get "Hello {{customer_name}}" in their inbox.

The agent had no way to know the template was broken. There was no check between "build the email" and "send the email."

gates:
  - name: email-template-guard
    trigger: send
    resource: email.batch
    policies:
      - require_approval_if_template_error:
          patterns:
            - "{{*}}"
            - "{%*%}"

Gate scans the rendered output before delivery. Finds an unresolved template token. Holds the batch. Someone gets a Slack notification instead of customers getting embarrassing email.

The Structural Fix

These aren't three separate bugs. They're the same architectural gap — agents that can write to production without a checkpoint.

The pattern that fixes all three:

Agent submits intent → Policy evaluates → Human approves (if needed) → Gate executes

The agent still has full write access. Nothing changes about what it can do. What changes is that every write is an intent first — not an execution. The policy layer runs instantly. Most writes pass through without interruption. The ones that don't get caught before they matter.

This is what Zehrava Gate does. It sits between your agents and your production systems as a commit checkpoint. Agents submit intents. Policies evaluate. Writes either clear automatically or go to a review queue. Everything that happens is logged — intent, policy result, approval or rejection, execution outcome.

That's the audit trail that was missing on Friday night when 847 records got overwritten. For teams managing multiple agents with different risk profiles, Policy-as-YAML: AI Agent Policy Enforcement That Scales Across Teams covers how to define those rules once and share them everywhere.

Getting Started

Gate works with any agent framework. It doesn't care what's calling it — LangChain, CrewAI, AutoGen, a custom Python script. It intercepts at the write layer, not the agent layer.

A basic configuration looks like this:

gate:
  endpoint: https://api.zehrava.com/v1/propose
  api_key: ${ZEHRAVA_API_KEY}
  default_policy: require_approval_over: 50

resources:
  crm.contacts:
    policies:
      - require_approval_over: 100

  email.batch:
    policies:
      - require_approval_if_template_error

  leads.outbound_call:
    policies:
      - idempotency_key: # set per-intent to block duplicates
          window_seconds: 300
          key: lead_id

The first time an agent proposes a bulk write, you see it before it runs. That's the whole idea.

You don't need to rewrite your agents. You need a layer between them and the systems they touch. Gate is that layer.

Add a commit checkpoint to your agent stack.

MIT license. Self-hostable. Framework-agnostic. Takes under an hour to wrap your first agent.

Get started free →