sentryagent-idp/docs/compliance/audit-log-runbook.md

# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP

**Control:** SOC 2 CC7.2 — Audit Log Integrity
**Service:** `src/services/AuditVerificationService.ts`
**Job:** `src/jobs/AuditChainVerificationJob.ts`
**Endpoint:** `GET /api/v1/audit/verify`

---

## Overview

Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
via a SHA-256 hash chain. Each event stores:

- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)

The first event in the chain uses `previous_hash = ''` (empty string sentinel).

A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
on `audit_events`, making the log tamper-evident at the database level.

---

## Running GET /audit/verify

### Full chain verification (no date range)

```bash
# Requires Bearer token with audit:read scope
curl -s -H "Authorization: Bearer <token>" \
  "https://api.sentryagent.ai/v1/audit/verify"
```

**Response (chain intact):**
```json
{
  "verified": true,
  "checkedCount": 18504,
  "brokenAtEventId": null
}
```

**Response (chain break detected):**
```json
{
  "verified": false,
  "checkedCount": 1203,
  "brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
}
```

### Date-ranged verification

```bash
curl -s -H "Authorization: Bearer <token>" \
  "https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
```

### Interpreting the response

| Field | Meaning |
|---|---|
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
| `checkedCount` | Number of events examined (0 = no events in range) |
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |

---

## AuditChainVerificationJob

The `AuditChainVerificationJob` runs automatically in the background every hour (default).
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).

On each tick it calls `verifyChain()` and:
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
- Updates `ComplianceStatusStore` with `CC7.2 = passing`

If verification fails:
- Sets gauge to **0**
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`

---

## What to Do When `brokenAtEventId` is Returned

### Step 1: Preserve Evidence

Immediately capture the full state of the audit log for forensic analysis:

```sql
-- Export all events around the break point
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
FROM audit_events
WHERE timestamp >= (
  SELECT timestamp - INTERVAL '1 hour'
  FROM audit_events WHERE event_id = '<brokenAtEventId>'
)
ORDER BY timestamp ASC, event_id ASC;
```

Save the output to a secure, immutable location (e.g. S3 with object locking).

### Step 2: Identify the Break Type

Compare the recomputed hash for the broken event with its stored hash:

```bash
# Using Node.js
node -e "
const crypto = require('crypto');
const eventId = '<event_id>';
const timestamp = '<timestamp_from_db>';
const action = '<action>';
const outcome = '<outcome>';
const agentId = '<agent_id>';
const orgId = '<organization_id>';
const prevHash = '<previous_hash_from_db>';
const expected = crypto.createHash('sha256')
  .update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
  .digest('hex');
console.log('Expected hash:', expected);
console.log('Stored hash: <hash_from_db>');
console.log('Match:', expected === '<hash_from_db>');
"
```

Possible break types:
- **Hash mismatch only** — event data was modified after insertion
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
- **Both mismatched** — multiple modifications or an injection attack

### Step 3: Escalate

A chain break is a **critical security incident**. Immediately:

1. Notify the security team and CISO
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
4. Consider temporarily suspending API access pending investigation
5. Notify affected customers per data breach notification obligations

### Step 4: Forensic Investigation

Using PostgreSQL audit logs, Vault audit logs, and application logs:
- Identify which application process or database connection modified the row
- Correlate with access logs and authentication events
- Determine the extent of the compromise (single row vs. systematic)

---

## Verification Rate Limiting

`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
and poll `GET /compliance/controls` instead.

---

## SOC 2 Evidence Package

For auditors, provide:

1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
5. `docs/openapi/compliance.yaml` — endpoint specification