Run Ledger — When the run breaks, it doesn't start over

One ugly path

Failure happens at the
worst boundary.

Not at the beginning. Right after approval. Right after execution request. Right after the side effect. That is where replay becomes dangerous.

1. Intent proposed

2. Policy checked

3. Approval granted

4. Execution requested

5. Worker interrupted

6. Checkpoint sealed

7. Run resumed

8. Prior side effect skipped

9. Execution completes

This is the entire point of Run Ledger: continue from the last safe boundary instead of rebuilding from transcript and hoping nothing fires twice.

Why this exists

Starting over is
fake recovery.

Agent runs do not fail cleanly. They fail mid-approval, mid-execution, mid-retry, or after the side effect but before the receipt. Starting over from the top is not recovery. It is replay with worse odds.

Network interruption during API call

Agent enriches 847 leads and calls Salesforce. Connection drops halfway. Agent restarts. No checkpoint. Tries again — same 847 records, some already imported.

Replayed write

Approval wait stalls mid-flight

Agent submits high-risk write. Waits for human approval. Worker crashes. Approval comes through. No record of where the run was. Agent restarts from the top.

Lost run state

Worker crashes after execution request

Execution order issued. Worker fetches it. Calls Stripe. Payment succeeds. Worker crashes before reporting. Restart → no receipt. Did it execute? Try again?

Unknown execution state

Deploy restarts process

Agent halfway through a multi-step CRM sync. Deploy rolls out. Process terminates. Agent restarts. No saved state. Starts over. Some records already written.

No checkpoint

Duplicated side effects after retry

Agent calls webhook. Times out. No response. Retries. Webhook actually fired twice — first one just took 90 seconds. Now two notifications sent.

Duplicate side effect

Rebuilding context from transcript is unreliable

Agent crashes. Transcript has 200 messages. Replay from the top? Which tool calls already executed? Which approvals already granted? Transcript doesn't know.

Transcript ≠ execution history

Transcript replay is not execution continuity.
A transcript can tell you what the agent said. It cannot prove what executed, what was approved, what artifact was created, or which side effect must not run again. Run Ledger records receipts. That is the difference.

How it works

Ledger. Checkpoint.
Resume. Dedupe.

Four components. Ordered event history, sealed resumable state, continue from last safe point, block replayed side effects. That's it.

Ledger

Ordered event history for a tracked run. Every meaningful boundary is recorded: plan locked, tool finished, intent proposed, policy checked, approval received, execution requested, execution succeeded, interruption detected.

Checkpoint

Sealed resumable state captured at a safe boundary. Includes receipts, artifacts, unresolved approvals, blocked side-effect keys, and suggested next action.

Resume

Load the latest valid checkpoint, verify integrity, rebuild resume context, and continue from the current step instead of from the beginning.

Dedupe

External mutations, payments, and notifications get side-effect keys. If a key already exists in the ledger, the replay is skipped.

No workflow engine. No transcript replay. No UI-heavy abstraction layer. Just continuity.

Where Run Ledger sits

Temporal, LangGraph, Restate, and Inngest
all solve adjacent durability problems.

Run Ledger is narrower than workflow orchestration. It lives at the governed write boundary inside Gate: approvals, execution boundaries, checkpoint integrity, and non-replayable side effects.

Feature	Temporal	LangGraph	Restate	Inngest	Zehrava
Durable execution	✓	✓	✓	✓	✓
Agent-aware	—	✓	—	—	✓
Approval gates	—	—	—	—	✓
Side-effect dedupe	—	Graph nodes	—	Steps	Receipts
Governed writes	—	—	—	—	✓

Release v1

What ships in v1.

Execution continuity. Sealed checkpoints. Side-effect deduplication. CLI inspection. Opt-in integration — backward compatible with existing Gate deployments.

Execution continuity

Resume interrupted tracked runs from the last safe checkpoint.

Checkpoint integrity

Sealed checkpoints fail verification if tampered with.

Side-effect dedupe

Skip non-replayable side effects that already happened.

CLI inspection

Inspect, verify, checkpoint, and resume from terminal.

Backward compatible

Opt-in per run. Existing Gate flows still work.

Governed run history

Intent and execution lifecycle recorded in one run ledger.

SDK usage

Start run. Work happens.
Interruption. Resume.

Run Ledger is live in the package. Start a tracked run, record execution boundaries, seal a checkpoint, and resume from the last safe point.

JavaScript / TypeScript — full flow

import { Gate } from "zehrava-gate"

const gate = new Gate({ endpoint: "https://gate.yourco.com", apiKey: "gate_sk_..." })

const run = await gate.startRun({
  agentId: "enrichment-agent",
  intentSummary: "Enrich leads and sync approved changes to Salesforce"
})

await gate.recordEvent({
  runId: run.runId,
  eventType: "plan_locked",
  actorId: "enrichment-agent",
  stepName: "enrich",
  payload: { steps: ["fetch", "enrich", "review", "sync"] },
  sideEffectClass: "none"
})

const intent = await gate.propose({
  runId: run.runId,
  payload: "./leads.csv",
  destination: "salesforce.import",
  policy: "crm-low-risk",
  recordCount: 847,
  idempotency_key: "batch-2026-03-22"
})

// interruption happens here

await gate.createCheckpoint({
  runId: run.runId,
  reason: "interruption",
  suggestedNextAction: "resume_then_execute"
})

const ctx = await gate.resumeRun({ runId: run.runId })

if (!ctx.blockedSideEffectKeys.has("salesforce.import:batch-2026-03-22")) {
  await gate.execute({ intentId: intent.intentId })
}

Tracked runs attach continuity to the real Gate lifecycle: propose, approve, execute, report.

CLI inspection

Inspect. Verify. Resume.
Terminal-first.

No dashboard in v1. Everything via CLI. Inspect run status, list events, verify checkpoint integrity, resume from last safe point.

bash

zehrava-gate runs inspect run_abc123

Status: interrupted

Checkpoints: 1

Resumable: Yes

zehrava-gate runs verify run_abc123

Checkpoint Integrity: Valid

zehrava-gate runs resume run_abc123

Loaded checkpoint ckpt_xyz789

Blocked side effects: 1

Resume context built

The CLI is the proof surface in v1. If a run cannot be inspected, verified, and resumed from terminal, the system is not trustworthy yet.

Web UI

Dashboard

View pending approvals, audit trails, and run history in the web UI. No terminal required.

The Dashboard provides a visual interface for monitoring tracked runs, inspecting checkpoints, and reviewing execution history. Approved intents, blocked side effects, and resumable runs — all visible without touching the CLI.

Open Dashboard →

What v1 doesn't do

Honest limitations.

These are not bugs. These are conscious decisions about what ships in v1 versus what waits for v2. If you need one of these, self-host and build it — or wait for the roadmap.

No cross-runtime portability

Resume packets are Zehrava-specific.

No distributed consensus

Checkpoints are local SQLite. Multi-node needs external coordination.

Manual review fallback only

Failed verification requires operator investigation. No auto-repair.

No automated pruning

Old runs accumulate. Operator cleans up manually.

No transactional writes

Events recorded individually. Rare partial state possible.

No UI in v1

CLI-only inspect/resume. Dashboard ships in v2.

No policy-driven checkpointing

Checkpoint triggers are manual or event-based.

No lineage verification

Parent/child integrity checks are roadmap.

Run Ledger is done when
interrupted runs don't start over.

Self-host free. MIT license. Full docs, SDK, CLI, and integration guide available now.

Read Full Docs View on GitHub

When the run breaks,it doesn't start over.

Failure happens at theworst boundary.

Starting over isfake recovery.

Network interruption during API call

Approval wait stalls mid-flight

Worker crashes after execution request

Deploy restarts process

Duplicated side effects after retry

Rebuilding context from transcript is unreliable

Ledger. Checkpoint.Resume. Dedupe.

Ledger

Checkpoint

Resume

Dedupe

Temporal, LangGraph, Restate, and Inngestall solve adjacent durability problems.

What ships in v1.

Execution continuity

Checkpoint integrity

Side-effect dedupe

CLI inspection

Backward compatible

Governed run history

Start run. Work happens.Interruption. Resume.

Inspect. Verify. Resume.Terminal-first.

Dashboard

Honest limitations.

No cross-runtime portability

No distributed consensus

Manual review fallback only

No automated pruning

No transactional writes

No UI in v1

No policy-driven checkpointing

No lineage verification

Run Ledger is done wheninterrupted runs don't start over.

When the run breaks,
it doesn't start over.

Failure happens at the
worst boundary.

Starting over is
fake recovery.

Ledger. Checkpoint.
Resume. Dedupe.

Temporal, LangGraph, Restate, and Inngest
all solve adjacent durability problems.

Start run. Work happens.
Interruption. Resume.

Inspect. Verify. Resume.
Terminal-first.

Run Ledger is done when
interrupted runs don't start over.