Available now in zehrava-gate@0.3.0
Run Ledger
Run Ledger adds execution continuity to Gate. When an agent run breaks — crash, interruption, approval wait — it can resume from the last valid checkpoint without replaying side effects or losing progress.
What is Run Ledger?
Run Ledger is an execution continuity system for AI agents. It records every meaningful event in an agent's run, creates sealed checkpoints at safe boundaries, and allows interrupted runs to resume without data loss or duplicate side effects.
This is not a new workflow engine, memory system, or transcript replay mechanism. It is a boring, reliable execution ledger with checkpointing, resume, and side-effect deduplication.
Why it matters
Agent runs fail for reasons outside the agent's control:
- Network interruptions during API calls
- Server restarts or deploys
- Human approval required mid-execution
- Rate limit backoff that exceeds timeout
- Process crashes or OOM kills
Without Run Ledger, the agent must start over from the beginning. Work already completed is lost. Side effects already executed — emails sent, database writes, API calls — get replayed, causing duplicates.
Run Ledger solves this. Interrupted runs resume from the last checkpoint. Side effects are deduplicated. Progress is preserved.
How it works
Three components work together:
1. Ledger — ordered log of events
Every meaningful event in a run is recorded with a sequence number, event type, actor, payload, and side effect class. Events are immutable once recorded. The ledger is the source of truth for what happened during a run.
2. Checkpoint — sealed resumable state
At safe boundaries (interruption, approval request, failure), the system seals a checkpoint. The checkpoint contains everything needed to continue: which events were completed, which artifacts were created, which approvals are pending, and which side effects must not be replayed.
3. Resume — restore from checkpoint
When the run resumes, the system loads the latest checkpoint, verifies its integrity, and builds a resume context. The agent sees receipts (what happened), artifacts (what was created), and blocked side effects (what must be skipped). Execution continues from the current step.
Event model
Run Ledger tracks 18 event types. Events fall into two categories:
Progress events
These indicate meaningful work completed. Only progress events update last_safe_event_id:
| Event | Meaning |
|---|---|
| plan_locked | Execution plan finalized |
| tool_call_finished | Tool invocation completes |
| artifact_created | File or output created |
| policy_checked | Policy evaluation complete |
| approval_received | Human approved |
| execution_succeeded | Execution completed |
| delegation_finished | Sub-agent completed |
| run_completed | Run finished successfully |
Non-progress events
These record state transitions but do not count as safe checkpoints:
| Event | Meaning |
|---|---|
| run_started | Run begins |
| tool_call_started | Tool invocation begins |
| intent_proposed | Gate intent submitted |
| approval_requested | Human approval needed |
| execution_requested | Execution order issued |
| delegation_started | Work delegated to sub-agent |
| checkpoint_sealed | Checkpoint created |
| interruption_detected | Run interrupted |
| run_resumed | Run resumed from checkpoint |
| run_failed | Run hard failed |
Side effect classes
Each event declares its side effect class. This determines whether the event can be safely replayed on resume:
| Class | Replayable | Examples |
|---|---|---|
| none | ✅ Yes | Pure computation, logging |
| read | ✅ Yes | Database reads, API GETs |
| write | ⚠️ Maybe | Local file writes (idempotent) |
| external_mutation | ❌ No | Database writes, API POSTs |
| payment | ❌ No | Stripe charges, refunds |
| notification | ❌ No | Emails, SMS, webhooks |
| delegation | ⚠️ Maybe | Sub-agent spawns (if idempotent) |
Side effect deduplication
For events marked external_mutation, payment, or notification, the system computes a deduplication key:
side_effect_key = SHA-256({ action, target, payloadHash })
Before re-executing on resume, the system checks whether this key was already recorded. If found, the event is skipped and marked status = 'skipped'. This prevents double-sends, double-charges, and duplicate database writes.
Checkpoints
A checkpoint is a sealed snapshot of a run's state at a specific event. Checkpoints are created when:
- Interruption detected (crash, timeout)
- Approval requested (waiting for human review)
- Explicit handoff to another agent
- Manual checkpoint via CLI
Checkpoint structure
Each checkpoint contains a resume packet (JSON) and a sealed hash. The hash is computed over:
sealed_hash = SHA-256({
checkpointId,
ledgerId,
eventId,
canonicalize(resumePacket),
eventHashes
})
Tampering with the checkpoint or its events breaks the hash. Verification fails.
Resume packet
The resume packet is everything needed to continue:
{
"runId": "run_abc123",
"ledgerId": "ledger_xyz789",
"checkpointEventId": "evt_...",
"currentStep": "review",
"receipts": [ ... ], // what happened
"artifacts": [ ... ], // what was created
"unresolvedApprovals": [ ... ], // what's blocked
"nonReplayableSideEffects": [ ... ], // what must not repeat
"suggestedNextAction": "await_approval_then_execute",
"schemaVersion": 1
}
Resume model
Resume flow in five steps:
- Load latest checkpoint marked
is_resumable = 1 - Verify
sealed_hashintegrity (fails if tampered) - Parse
resume_packet_json - Build blocked side effect keys set for fast lookup
- Emit
run_resumedevent and update status toactive
The agent receives a resume context object:
{
runId,
ledgerId,
checkpointId,
currentStep,
receipts, // Array of completed events
artifacts, // Array of created files/outputs
unresolvedApprovals, // Pending human approvals
remainingPermissions, // What's still allowed
blockedSideEffectKeys, // Set of keys to skip
suggestedNextAction,
resumedAt,
schemaVersion
}
Before executing any side effect, check the blocked set:
if (resumeContext.blockedSideEffectKeys.has(sideEffectKey)) { // Skip — already executed before interruption return; } // Safe to execute execute();
Integration — Start a run
Before your agent begins work, start a run:
JavaScript
const { Gate } = require('zehrava-gate'); const gate = new Gate({ endpoint: 'http://localhost:4000', apiKey: 'gate_sk_...' }); const run = await gate.startRun({ agentId: 'my-agent', intentSummary: 'Enrich leads and sync to Salesforce', runtime: 'zehrava-gate', permissions: { allowed_tools: ['fetch', 'enrich', 'sync'] } }); // run.runId → "run_abc123"
Python
from zehrava_gate import Gate gate = Gate(endpoint="http://localhost:4000", api_key="gate_sk_...") run = gate.start_run( agent_id="my-agent", intent_summary="Enrich leads and sync to Salesforce", runtime="zehrava-gate", permissions={"allowed_tools": ["fetch", "enrich", "sync"]} ) # run["runId"] → "run_abc123"
Integration — Record events
As your agent works, record progress events:
JavaScript
const { EVENT_TYPES, SIDE_EFFECT_CLASS } = require('zehrava-gate/lib/runs'); // Tool call finished await gate.recordEvent({ runId: run.runId, eventType: EVENT_TYPES.TOOL_CALL_FINISHED, actorId: 'my-agent', stepName: 'fetch', payload: { tool: 'fetch_data', recordsFetched: 847 }, sideEffectClass: SIDE_EFFECT_CLASS.READ }); // Execution succeeded (with deduplication key) await gate.recordEvent({ runId: run.runId, eventType: EVENT_TYPES.EXECUTION_SUCCEEDED, actorId: 'salesforce-worker', stepName: 'sync', payload: { executionId: 'exe_...', recordsSynced: 847 }, sideEffectClass: SIDE_EFFECT_CLASS.EXTERNAL_MUTATION, sideEffectKey: hash.sideEffectKey('salesforce.import', 'salesforce', { batch: 'batch-001' }) });
Python
from zehrava_gate.runs import EVENT_TYPES, SIDE_EFFECT_CLASS # Tool call finished gate.record_event( run_id=run["runId"], event_type=EVENT_TYPES.TOOL_CALL_FINISHED, actor_id="my-agent", step_name="fetch", payload={"tool": "fetch_data", "recordsFetched": 847}, side_effect_class=SIDE_EFFECT_CLASS.READ )
Integration — Create checkpoint
Create a checkpoint manually or on interruption:
JavaScript
const checkpoint = await gate.createCheckpoint({ runId: run.runId, reason: 'approval_requested', suggestedNextAction: 'await_approval_then_execute' }); // checkpoint.checkpointId → "ckpt_xyz789" // checkpoint.isResumable → true // checkpoint.sealedHash → "abc123..."
Python
checkpoint = gate.create_checkpoint(
run_id=run["runId"],
reason="approval_requested",
suggested_next_action="await_approval_then_execute"
)
Integration — Resume run
Resume from the latest checkpoint:
JavaScript
const ctx = await gate.resumeRun({ runId: run.runId }); // ctx.receipts → what happened before interruption // ctx.artifacts → files created // ctx.unresolvedApprovals → pending approvals // ctx.blockedSideEffectKeys → Set of keys to skip // ctx.suggestedNextAction → where to continue if (ctx.blockedSideEffectKeys.has(mySideEffectKey)) { console.log('Already executed — skipping'); } else { execute(); }
Python
ctx = gate.resume_run(run_id=run["runId"]) if my_side_effect_key in ctx["blockedSideEffectKeys"]: print("Already executed — skipping") else: execute()
Code examples
Full example showing interruption and resume:
// See examples/interrupted_intent_run_resume.js in the repo const run = await gate.startRun({ ... }); // Execute work await gate.recordEvent({ eventType: EVENT_TYPES.TOOL_CALL_FINISHED, ... }); await gate.recordEvent({ eventType: EVENT_TYPES.ARTIFACT_CREATED, ... }); // Propose intent const intent = await gate.propose({ ... }); // Simulate crash await gate.recordEvent({ eventType: EVENT_TYPES.INTERRUPTION_DETECTED, ... }); const checkpoint = await gate.createCheckpoint({ reason: 'interruption' }); // ... time passes, system restarts ... // Resume const ctx = await gate.resumeRun({ runId: run.runId }); // Continue from where we left off if (ctx.unresolvedApprovals.length > 0) { await waitForApproval(ctx.unresolvedApprovals[0].intentId); } // Execute — side effects are deduplicated automatically await gate.execute({ intentId: intent.intentId });
Dashboard
View run history, checkpoints, and audit trails in the web UI:
The Dashboard provides a visual interface for monitoring tracked runs without CLI commands. View pending approvals, inspect checkpoints, review execution history, and track resumable runs — all in your browser.
Open DashboardCLI — inspect
Show run details including events, checkpoints, and resumability:
zehrava-gate runs inspect run_abc123
Run: run_abc123 ──────────────────────────────────────────── Status: active Intent: Enrich leads and sync to Salesforce Current Step: review Agent: lead-enrichment-agent Runtime: zehrava-gate Events: 12 Checkpoints: 1 Artifacts: 1 Unresolved Approvals: 1 Blocked Side Effects: 2 Last Safe Event: evt_xyz789 Latest Checkpoint: ckpt_abc123 Resumable: Yes Lineage Valid: Yes Created: 2026-03-22T10:00:00.000Z Updated: 2026-03-22T10:05:30.000Z
CLI — events
List all events for a run in sequence:
zehrava-gate runs events run_abc123
Events for run: run_abc123 ──────────────────────────────────────────────────────────────────────────────────── Seq Event Type Actor Status Side Effect ──────────────────────────────────────────────────────────────────────────────────── 1 run_started my-agent recorded none 2 plan_locked my-agent recorded none 3 tool_call_started my-agent recorded read 4 tool_call_finished my-agent recorded read 5 tool_call_started my-agent recorded write 6 tool_call_finished my-agent recorded write 7 artifact_created my-agent recorded none 8 intent_proposed my-agent recorded none 9 policy_checked gate recorded none 10 approval_requested my-agent recorded none 11 interruption_detected system recorded none 12 checkpoint_sealed system recorded none
CLI — checkpoint
Create a checkpoint manually:
zehrava-gate runs checkpoint run_abc123
Checkpoint created: ckpt_xyz789 Sealed Hash: abc123def456... Resumable: Yes Reason: manual
CLI — resume
Resume from the latest checkpoint:
zehrava-gate runs resume run_abc123
Resumed run: run_abc123 ──────────────────────────────────────────── Checkpoint: ckpt_xyz789 Current Step: review Receipts: 7 Artifacts: 1 Unresolved Approvals: 1 Blocked Side Effects: 2 Suggested Next: await_approval_then_execute
CLI — verify
Verify run and checkpoint integrity:
zehrava-gate runs verify run_abc123
Verification for run: run_abc123 ──────────────────────────────────────────── Ledger Integrity: Valid Checkpoint Integrity: Valid Lineage Continuity: Valid Total Checkpoints: 1 Valid Checkpoints: 1
API — POST /internal/runs/start
Start a new run. Returns run ID and ledger ID.
POST /internal/runs/start Content-Type: application/json { "agentId": "my-agent", "intentSummary": "Enrich leads and sync to Salesforce", "runtime": "zehrava-gate", "parentRunId": "run_parent", // optional "permissions": { "allowed_tools": ["..."] } }
Response:
{
"runId": "run_abc123",
"ledgerId": "ledger_xyz789",
"status": "active",
"createdAt": "2026-03-22T10:00:00.000Z"
}
API — POST /internal/runs/:runId/events
Record an event for a run.
POST /internal/runs/run_abc123/events Content-Type: application/json { "eventType": "tool_call_finished", "actorId": "my-agent", "stepName": "fetch", "payload": { "tool": "fetch_data", "recordsFetched": 847 }, "sideEffectClass": "read", "sideEffectKey": "abc123..." // optional }
Response:
{
"eventId": "evt_def456",
"seq": 4,
"eventType": "tool_call_finished",
"status": "recorded",
"createdAt": "2026-03-22T10:01:00.000Z"
}
API — POST /internal/runs/:runId/checkpoint
Create a checkpoint for a run.
POST /internal/runs/run_abc123/checkpoint Content-Type: application/json { "eventId": "evt_xyz789", // optional — defaults to most recent "reason": "approval_requested", "suggestedNextAction": "await_approval_then_execute" }
Response:
{
"checkpointId": "ckpt_abc123",
"sealedHash": "abc123def456...",
"isResumable": true,
"createdAt": "2026-03-22T10:05:30.000Z"
}
API — POST /internal/runs/:runId/resume
Resume from the latest checkpoint.
POST /internal/runs/run_abc123/resume Content-Type: application/json { "fromCheckpointId": "ckpt_abc123" // optional — defaults to latest }
Response:
{
"runId": "run_abc123",
"checkpointId": "ckpt_abc123",
"currentStep": "review",
"receipts": [ ... ],
"artifacts": [ ... ],
"unresolvedApprovals": [ ... ],
"blockedSideEffectKeys": [ "abc123...", "def456..." ],
"suggestedNextAction": "await_approval_then_execute",
"resumedAt": "2026-03-22T11:00:00.000Z"
}
API — GET /internal/runs/:runId
Get full run details including events, checkpoints, and artifacts.
GET /internal/runs/run_abc123
Response:
{
"run": {
"runId": "run_abc123",
"ledgerId": "ledger_xyz789",
"agentId": "my-agent",
"intentSummary": "Enrich leads and sync to Salesforce",
"status": "active",
"currentStep": "review",
"lastSafeEventId": "evt_xyz789",
"createdAt": "2026-03-22T10:00:00.000Z",
"updatedAt": "2026-03-22T10:05:30.000Z"
},
"events": [ ... ],
"checkpoints": [ ... ],
"artifacts": [ ... ],
"resumableCheckpoints": [ "ckpt_abc123" ]
}
API — POST /internal/runs/:runId/verify
Verify run and checkpoint integrity.
POST /internal/runs/run_abc123/verify
Response:
{
"runId": "run_abc123",
"ledgerIntegrity": {
"valid": true,
"hash": "abc123..."
},
"checkpointIntegrity": {
"valid": true,
"checkpoints": [
{ "checkpointId": "ckpt_abc123", "valid": true }
]
},
"lineageContinuity": {
"valid": true,
"parentRunId": null
}
}
Limitations
Run Ledger v1 has seven known limitations:
1. No cross-runtime portability
Resume packets are Zehrava-specific. No protocol or spec for other runtimes.
2. No distributed consensus
Checkpoints are stored in local SQLite. Multi-node deployments need external coordination.
3. Manual review fallback only
If checkpoint verification fails, the run is marked manual_review_required. No auto-repair.
4. No automated pruning
Old runs, events, and checkpoints accumulate. Operator must clean up manually.
5. No transactional writes
Events are recorded individually. Interruption mid-checkpoint could leave partial state.
6. No UI
Dashboard not built. CLI-only inspect and resume.
7. No policy-driven checkpointing
Checkpoint triggers are manual or event-based. No automatic "checkpoint every N progress events" policy.
⚠ Run Ledger is opt-in. Agents must call startRun and recordEvent to be tracked. Runs that do not integrate with Run Ledger cannot use execution continuity.
Integrity model
Two types of integrity verification:
Ledger integrity hash
Computed once at run creation:
integrity_hash = SHA-256({
runId,
agentId,
intentSummary,
schemaVersion
})
Stored in run_ledgers.integrity_hash. Verifies run identity.
Checkpoint sealed hash
Computed at checkpoint creation:
sealed_hash = SHA-256({
checkpointId,
ledgerId,
eventId,
canonicalize(resumePacket),
eventHashes: events.map(e => SHA-256({ e.id, e.seq, e.type, e.payload }))
})
Stored in run_checkpoints.sealed_hash. Verifies checkpoint has not been tampered with.
Canonical serialization
Before hashing any object:
- Sort keys recursively
- Remove
undefinedvalues - Serialize to JSON
This ensures stable hashes across key-order differences.
Verification in production: Run zehrava-gate runs verify <runId> to check integrity. If verification fails, the checkpoint cannot be trusted. Mark the run manual_review_required and investigate.
Run Ledger is internal to Gate. The /internal/runs/* endpoints are not part of the public API surface and may change.