Product: Run Ledger

Part of: Gate

Available now in zehrava-gate@0.3.0

Feature

Run Ledger

Run Ledger adds execution continuity to Gate. When an agent run breaks — crash, interruption, approval wait — it can resume from the last valid checkpoint without replaying side effects or losing progress.

What is Run Ledger?

Run Ledger is an execution continuity system for AI agents. It records every meaningful event in an agent's run, creates sealed checkpoints at safe boundaries, and allows interrupted runs to resume without data loss or duplicate side effects.

This is not a new workflow engine, memory system, or transcript replay mechanism. It is a boring, reliable execution ledger with checkpointing, resume, and side-effect deduplication.

Why it matters

Agent runs fail for reasons outside the agent's control:

Network interruptions during API calls
Server restarts or deploys
Human approval required mid-execution
Rate limit backoff that exceeds timeout
Process crashes or OOM kills

Without Run Ledger, the agent must start over from the beginning. Work already completed is lost. Side effects already executed — emails sent, database writes, API calls — get replayed, causing duplicates.

Run Ledger solves this. Interrupted runs resume from the last checkpoint. Side effects are deduplicated. Progress is preserved.

How it works

Three components work together:

1. Ledger — ordered log of events

Every meaningful event in a run is recorded with a sequence number, event type, actor, payload, and side effect class. Events are immutable once recorded. The ledger is the source of truth for what happened during a run.

2. Checkpoint — sealed resumable state

At safe boundaries (interruption, approval request, failure), the system seals a checkpoint. The checkpoint contains everything needed to continue: which events were completed, which artifacts were created, which approvals are pending, and which side effects must not be replayed.

3. Resume — restore from checkpoint

When the run resumes, the system loads the latest checkpoint, verifies its integrity, and builds a resume context. The agent sees receipts (what happened), artifacts (what was created), and blocked side effects (what must be skipped). Execution continues from the current step.

Event model

Run Ledger tracks 18 event types. Events fall into two categories:

Progress events

These indicate meaningful work completed. Only progress events update last_safe_event_id:

Event	Meaning
plan_locked	Execution plan finalized
tool_call_finished	Tool invocation completes
artifact_created	File or output created
policy_checked	Policy evaluation complete
approval_received	Human approved
execution_succeeded	Execution completed
delegation_finished	Sub-agent completed
run_completed	Run finished successfully

Non-progress events

These record state transitions but do not count as safe checkpoints:

Event	Meaning
run_started	Run begins
tool_call_started	Tool invocation begins
intent_proposed	Gate intent submitted
approval_requested	Human approval needed
execution_requested	Execution order issued
delegation_started	Work delegated to sub-agent
checkpoint_sealed	Checkpoint created
interruption_detected	Run interrupted
run_resumed	Run resumed from checkpoint
run_failed	Run hard failed

Side effect classes

Each event declares its side effect class. This determines whether the event can be safely replayed on resume:

Class	Replayable	Examples
none	✅ Yes	Pure computation, logging
read	✅ Yes	Database reads, API GETs
write	⚠️ Maybe	Local file writes (idempotent)
external_mutation	❌ No	Database writes, API POSTs
payment	❌ No	Stripe charges, refunds
notification	❌ No	Emails, SMS, webhooks
delegation	⚠️ Maybe	Sub-agent spawns (if idempotent)

Side effect deduplication

For events marked external_mutation, payment, or notification, the system computes a deduplication key:

side_effect_key = SHA-256({ action, target, payloadHash })

Before re-executing on resume, the system checks whether this key was already recorded. If found, the event is skipped and marked status = 'skipped'. This prevents double-sends, double-charges, and duplicate database writes.

Checkpoints

A checkpoint is a sealed snapshot of a run's state at a specific event. Checkpoints are created when:

Interruption detected (crash, timeout)
Approval requested (waiting for human review)
Explicit handoff to another agent
Manual checkpoint via CLI

Checkpoint structure

Each checkpoint contains a resume packet (JSON) and a sealed hash. The hash is computed over:

sealed_hash = SHA-256({
  checkpointId,
  ledgerId,
  eventId,
  canonicalize(resumePacket),
  eventHashes
})

Tampering with the checkpoint or its events breaks the hash. Verification fails.

Resume packet

The resume packet is everything needed to continue:

{
  "runId": "run_abc123",
  "ledgerId": "ledger_xyz789",
  "checkpointEventId": "evt_...",
  "currentStep": "review",
  "receipts": [ ... ],                    // what happened
  "artifacts": [ ... ],                   // what was created
  "unresolvedApprovals": [ ... ],         // what's blocked
  "nonReplayableSideEffects": [ ... ],    // what must not repeat
  "suggestedNextAction": "await_approval_then_execute",
  "schemaVersion": 1
}

Resume model

Resume flow in five steps:

Load latest checkpoint marked is_resumable = 1
Verify sealed_hash integrity (fails if tampered)
Parse resume_packet_json
Build blocked side effect keys set for fast lookup
Emit run_resumed event and update status to active

The agent receives a resume context object:

{
  runId,
  ledgerId,
  checkpointId,
  currentStep,
  receipts,                      // Array of completed events
  artifacts,                     // Array of created files/outputs
  unresolvedApprovals,           // Pending human approvals
  remainingPermissions,          // What's still allowed
  blockedSideEffectKeys,         // Set of keys to skip
  suggestedNextAction,
  resumedAt,
  schemaVersion
}

Before executing any side effect, check the blocked set:

if (resumeContext.blockedSideEffectKeys.has(sideEffectKey)) {
  // Skip — already executed before interruption
  return;
}
// Safe to execute
execute();

Integration — Start a run

Before your agent begins work, start a run:

JavaScript

const { Gate } = require('zehrava-gate');
const gate = new Gate({ endpoint: 'http://localhost:4000', apiKey: 'gate_sk_...' });

const run = await gate.startRun({
  agentId: 'my-agent',
  intentSummary: 'Enrich leads and sync to Salesforce',
  runtime: 'zehrava-gate',
  permissions: { allowed_tools: ['fetch', 'enrich', 'sync'] }
});

// run.runId → "run_abc123"

Python

from zehrava_gate import Gate

gate = Gate(endpoint="http://localhost:4000", api_key="gate_sk_...")

run = gate.start_run(
    agent_id="my-agent",
    intent_summary="Enrich leads and sync to Salesforce",
    runtime="zehrava-gate",
    permissions={"allowed_tools": ["fetch", "enrich", "sync"]}
)

# run["runId"] → "run_abc123"

Integration — Record events

As your agent works, record progress events:

JavaScript

const { EVENT_TYPES, SIDE_EFFECT_CLASS } = require('zehrava-gate/lib/runs');

// Tool call finished
await gate.recordEvent({
  runId: run.runId,
  eventType: EVENT_TYPES.TOOL_CALL_FINISHED,
  actorId: 'my-agent',
  stepName: 'fetch',
  payload: { tool: 'fetch_data', recordsFetched: 847 },
  sideEffectClass: SIDE_EFFECT_CLASS.READ
});

// Execution succeeded (with deduplication key)
await gate.recordEvent({
  runId: run.runId,
  eventType: EVENT_TYPES.EXECUTION_SUCCEEDED,
  actorId: 'salesforce-worker',
  stepName: 'sync',
  payload: { executionId: 'exe_...', recordsSynced: 847 },
  sideEffectClass: SIDE_EFFECT_CLASS.EXTERNAL_MUTATION,
  sideEffectKey: hash.sideEffectKey('salesforce.import', 'salesforce', { batch: 'batch-001' })
});

Python

from zehrava_gate.runs import EVENT_TYPES, SIDE_EFFECT_CLASS

# Tool call finished
gate.record_event(
    run_id=run["runId"],
    event_type=EVENT_TYPES.TOOL_CALL_FINISHED,
    actor_id="my-agent",
    step_name="fetch",
    payload={"tool": "fetch_data", "recordsFetched": 847},
    side_effect_class=SIDE_EFFECT_CLASS.READ
)

Integration — Create checkpoint

Create a checkpoint manually or on interruption:

JavaScript

const checkpoint = await gate.createCheckpoint({
  runId: run.runId,
  reason: 'approval_requested',
  suggestedNextAction: 'await_approval_then_execute'
});

// checkpoint.checkpointId → "ckpt_xyz789"
// checkpoint.isResumable → true
// checkpoint.sealedHash → "abc123..."

Python

checkpoint = gate.create_checkpoint(
    run_id=run["runId"],
    reason="approval_requested",
    suggested_next_action="await_approval_then_execute"
)

Integration — Resume run

Resume from the latest checkpoint:

JavaScript

const ctx = await gate.resumeRun({ runId: run.runId });

// ctx.receipts → what happened before interruption
// ctx.artifacts → files created
// ctx.unresolvedApprovals → pending approvals
// ctx.blockedSideEffectKeys → Set of keys to skip
// ctx.suggestedNextAction → where to continue

if (ctx.blockedSideEffectKeys.has(mySideEffectKey)) {
  console.log('Already executed — skipping');
} else {
  execute();
}

Python

ctx = gate.resume_run(run_id=run["runId"])

if my_side_effect_key in ctx["blockedSideEffectKeys"]:
    print("Already executed — skipping")
else:
    execute()

Code examples

Full example showing interruption and resume:

// See examples/interrupted_intent_run_resume.js in the repo

const run = await gate.startRun({ ... });

// Execute work
await gate.recordEvent({ eventType: EVENT_TYPES.TOOL_CALL_FINISHED, ... });
await gate.recordEvent({ eventType: EVENT_TYPES.ARTIFACT_CREATED, ... });

// Propose intent
const intent = await gate.propose({ ... });

// Simulate crash
await gate.recordEvent({ eventType: EVENT_TYPES.INTERRUPTION_DETECTED, ... });
const checkpoint = await gate.createCheckpoint({ reason: 'interruption' });

// ... time passes, system restarts ...

// Resume
const ctx = await gate.resumeRun({ runId: run.runId });

// Continue from where we left off
if (ctx.unresolvedApprovals.length > 0) {
  await waitForApproval(ctx.unresolvedApprovals[0].intentId);
}

// Execute — side effects are deduplicated automatically
await gate.execute({ intentId: intent.intentId });

Dashboard

View run history, checkpoints, and audit trails in the web UI:

The Dashboard provides a visual interface for monitoring tracked runs without CLI commands. View pending approvals, inspect checkpoints, review execution history, and track resumable runs — all in your browser.

Open Dashboard

CLI — inspect

Show run details including events, checkpoints, and resumability:

zehrava-gate runs inspect run_abc123

Run: run_abc123
────────────────────────────────────────────
Status:              active
Intent:              Enrich leads and sync to Salesforce
Current Step:        review
Agent:               lead-enrichment-agent
Runtime:             zehrava-gate

Events:              12
Checkpoints:         1
Artifacts:           1
Unresolved Approvals: 1
Blocked Side Effects: 2

Last Safe Event:     evt_xyz789
Latest Checkpoint:   ckpt_abc123
Resumable:           Yes
Lineage Valid:       Yes

Created:             2026-03-22T10:00:00.000Z
Updated:             2026-03-22T10:05:30.000Z

CLI — events

List all events for a run in sequence:

zehrava-gate runs events run_abc123

Events for run: run_abc123
────────────────────────────────────────────────────────────────────────────────────
Seq  Event Type                   Actor                    Status      Side Effect
────────────────────────────────────────────────────────────────────────────────────
  1  run_started                  my-agent                 recorded    none
  2  plan_locked                  my-agent                 recorded    none
  3  tool_call_started            my-agent                 recorded    read
  4  tool_call_finished           my-agent                 recorded    read
  5  tool_call_started            my-agent                 recorded    write
  6  tool_call_finished           my-agent                 recorded    write
  7  artifact_created             my-agent                 recorded    none
  8  intent_proposed              my-agent                 recorded    none
  9  policy_checked               gate                     recorded    none
 10  approval_requested           my-agent                 recorded    none
 11  interruption_detected        system                   recorded    none
 12  checkpoint_sealed            system                   recorded    none

CLI — checkpoint

Create a checkpoint manually:

zehrava-gate runs checkpoint run_abc123

Checkpoint created: ckpt_xyz789
Sealed Hash:        abc123def456...
Resumable:          Yes
Reason:             manual

CLI — resume

Resume from the latest checkpoint:

zehrava-gate runs resume run_abc123

Resumed run: run_abc123
────────────────────────────────────────────
Checkpoint:          ckpt_xyz789
Current Step:        review
Receipts:            7
Artifacts:           1
Unresolved Approvals: 1
Blocked Side Effects: 2
Suggested Next:      await_approval_then_execute

CLI — verify

Verify run and checkpoint integrity:

zehrava-gate runs verify run_abc123

Verification for run: run_abc123
────────────────────────────────────────────
Ledger Integrity:     Valid
Checkpoint Integrity: Valid
Lineage Continuity:   Valid

Total Checkpoints:    1
Valid Checkpoints:    1

API — POST /internal/runs/start

Start a new run. Returns run ID and ledger ID.

POST /internal/runs/start
Content-Type: application/json

{
  "agentId": "my-agent",
  "intentSummary": "Enrich leads and sync to Salesforce",
  "runtime": "zehrava-gate",
  "parentRunId": "run_parent",              // optional
  "permissions": { "allowed_tools": ["..."] }
}

Response:

{
  "runId": "run_abc123",
  "ledgerId": "ledger_xyz789",
  "status": "active",
  "createdAt": "2026-03-22T10:00:00.000Z"
}

API — POST /internal/runs/:runId/events

Record an event for a run.

POST /internal/runs/run_abc123/events
Content-Type: application/json

{
  "eventType": "tool_call_finished",
  "actorId": "my-agent",
  "stepName": "fetch",
  "payload": { "tool": "fetch_data", "recordsFetched": 847 },
  "sideEffectClass": "read",
  "sideEffectKey": "abc123..."           // optional
}

Response:

{
  "eventId": "evt_def456",
  "seq": 4,
  "eventType": "tool_call_finished",
  "status": "recorded",
  "createdAt": "2026-03-22T10:01:00.000Z"
}

API — POST /internal/runs/:runId/checkpoint

Create a checkpoint for a run.

POST /internal/runs/run_abc123/checkpoint
Content-Type: application/json

{
  "eventId": "evt_xyz789",                  // optional — defaults to most recent
  "reason": "approval_requested",
  "suggestedNextAction": "await_approval_then_execute"
}

Response:

{
  "checkpointId": "ckpt_abc123",
  "sealedHash": "abc123def456...",
  "isResumable": true,
  "createdAt": "2026-03-22T10:05:30.000Z"
}

API — POST /internal/runs/:runId/resume

Resume from the latest checkpoint.

POST /internal/runs/run_abc123/resume
Content-Type: application/json

{
  "fromCheckpointId": "ckpt_abc123"        // optional — defaults to latest
}

Response:

{
  "runId": "run_abc123",
  "checkpointId": "ckpt_abc123",
  "currentStep": "review",
  "receipts": [ ... ],
  "artifacts": [ ... ],
  "unresolvedApprovals": [ ... ],
  "blockedSideEffectKeys": [ "abc123...", "def456..." ],
  "suggestedNextAction": "await_approval_then_execute",
  "resumedAt": "2026-03-22T11:00:00.000Z"
}

API — GET /internal/runs/:runId

Get full run details including events, checkpoints, and artifacts.

GET /internal/runs/run_abc123

Response:

{
  "run": {
    "runId": "run_abc123",
    "ledgerId": "ledger_xyz789",
    "agentId": "my-agent",
    "intentSummary": "Enrich leads and sync to Salesforce",
    "status": "active",
    "currentStep": "review",
    "lastSafeEventId": "evt_xyz789",
    "createdAt": "2026-03-22T10:00:00.000Z",
    "updatedAt": "2026-03-22T10:05:30.000Z"
  },
  "events": [ ... ],
  "checkpoints": [ ... ],
  "artifacts": [ ... ],
  "resumableCheckpoints": [ "ckpt_abc123" ]
}

API — POST /internal/runs/:runId/verify

Verify run and checkpoint integrity.

POST /internal/runs/run_abc123/verify

Response:

{
  "runId": "run_abc123",
  "ledgerIntegrity": {
    "valid": true,
    "hash": "abc123..."
  },
  "checkpointIntegrity": {
    "valid": true,
    "checkpoints": [
      { "checkpointId": "ckpt_abc123", "valid": true }
    ]
  },
  "lineageContinuity": {
    "valid": true,
    "parentRunId": null
  }
}

Limitations

Run Ledger v1 has seven known limitations:

1. No cross-runtime portability

Resume packets are Zehrava-specific. No protocol or spec for other runtimes.

2. No distributed consensus

Checkpoints are stored in local SQLite. Multi-node deployments need external coordination.

3. Manual review fallback only

If checkpoint verification fails, the run is marked manual_review_required. No auto-repair.

4. No automated pruning

Old runs, events, and checkpoints accumulate. Operator must clean up manually.

5. No transactional writes

Events are recorded individually. Interruption mid-checkpoint could leave partial state.

6. No UI

Dashboard not built. CLI-only inspect and resume.

7. No policy-driven checkpointing

Checkpoint triggers are manual or event-based. No automatic "checkpoint every N progress events" policy.

⚠ Run Ledger is opt-in. Agents must call startRun and recordEvent to be tracked. Runs that do not integrate with Run Ledger cannot use execution continuity.

Integrity model

Two types of integrity verification:

Ledger integrity hash

Computed once at run creation:

integrity_hash = SHA-256({
  runId,
  agentId,
  intentSummary,
  schemaVersion
})

Stored in run_ledgers.integrity_hash. Verifies run identity.

Checkpoint sealed hash

Computed at checkpoint creation:

sealed_hash = SHA-256({
  checkpointId,
  ledgerId,
  eventId,
  canonicalize(resumePacket),
  eventHashes: events.map(e => SHA-256({ e.id, e.seq, e.type, e.payload }))
})

Stored in run_checkpoints.sealed_hash. Verifies checkpoint has not been tampered with.

Canonical serialization

Before hashing any object:

Sort keys recursively
Remove undefined values
Serialize to JSON

This ensures stable hashes across key-order differences.

Verification in production: Run zehrava-gate runs verify <runId> to check integrity. If verification fails, the checkpoint cannot be trusted. Mark the run manual_review_required and investigate.

Run Ledger is internal to Gate. The /internal/runs/* endpoints are not part of the public API surface and may change.