Files
sentryagent-idp/docs/compliance/audit-log-runbook.md
SentryAgent.ai Developer fd90b2acd1 feat(phase-3): workstream 6 — SOC 2 Type II Preparation
Implements all 22 WS6 tasks completing Phase 3 Enterprise.

Column-level encryption (AES-256-CBC, Vault-backed key) via EncryptionService
applied to credentials.secret_hash, credentials.vault_path,
webhook_subscriptions.vault_secret_path, and agent_did_keys.vault_key_path.
Backward-compatible: isEncrypted() guard skips decryption for existing
plaintext rows until next read-write cycle.

Audit chain integrity (CC7.2): AuditRepository computes SHA-256 Merkle hash
on every INSERT (hash = SHA-256(eventId+timestamp+action+outcome+agentId+orgId+prevHash)).
AuditVerificationService walks the full chain verifying hash continuity.
AuditChainVerificationJob runs hourly; sets agentidp_audit_chain_integrity
Prometheus gauge to 1 (pass) or 0 (fail).

TLS enforcement (CC6.7): TLSEnforcementMiddleware registered as first
middleware in Express stack; 301 redirect on non-https X-Forwarded-Proto
in production.

SecretsRotationJob (CC9.2): hourly scan for credentials expiring within 7
days; increments agentidp_credentials_expiring_soon_total.

ComplianceController + routes: GET /audit/verify (auth+audit:read scope,
30/min rate-limit); GET /compliance/controls (public, Cache-Control 60s).
ComplianceStatusStore: module-level map updated by jobs, consumed by controller.

Prometheus: 2 new metrics (agentidp_credentials_expiring_soon_total,
agentidp_audit_chain_integrity); 6 alerting rules in alerts.yml.

Compliance docs: soc2-controls-matrix.md, encryption-runbook.md,
audit-log-runbook.md, incident-response.md, secrets-rotation.md.

Tests: 557 unit tests passing (35 suites); 26 new tests (EncryptionService,
AuditVerificationService); 19 compliance integration tests. TypeScript clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:41:53 +00:00

173 lines
5.6 KiB
Markdown

# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC7.2 — Audit Log Integrity
**Service:** `src/services/AuditVerificationService.ts`
**Job:** `src/jobs/AuditChainVerificationJob.ts`
**Endpoint:** `GET /api/v1/audit/verify`
---
## Overview
Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
via a SHA-256 hash chain. Each event stores:
- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)
The first event in the chain uses `previous_hash = ''` (empty string sentinel).
A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
on `audit_events`, making the log tamper-evident at the database level.
---
## Running GET /audit/verify
### Full chain verification (no date range)
```bash
# Requires Bearer token with audit:read scope
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify"
```
**Response (chain intact):**
```json
{
"verified": true,
"checkedCount": 18504,
"brokenAtEventId": null
}
```
**Response (chain break detected):**
```json
{
"verified": false,
"checkedCount": 1203,
"brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
}
```
### Date-ranged verification
```bash
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
```
### Interpreting the response
| Field | Meaning |
|---|---|
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
| `checkedCount` | Number of events examined (0 = no events in range) |
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |
---
## AuditChainVerificationJob
The `AuditChainVerificationJob` runs automatically in the background every hour (default).
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).
On each tick it calls `verifyChain()` and:
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
- Updates `ComplianceStatusStore` with `CC7.2 = passing`
If verification fails:
- Sets gauge to **0**
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`
---
## What to Do When `brokenAtEventId` is Returned
### Step 1: Preserve Evidence
Immediately capture the full state of the audit log for forensic analysis:
```sql
-- Export all events around the break point
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
FROM audit_events
WHERE timestamp >= (
SELECT timestamp - INTERVAL '1 hour'
FROM audit_events WHERE event_id = '<brokenAtEventId>'
)
ORDER BY timestamp ASC, event_id ASC;
```
Save the output to a secure, immutable location (e.g. S3 with object locking).
### Step 2: Identify the Break Type
Compare the recomputed hash for the broken event with its stored hash:
```bash
# Using Node.js
node -e "
const crypto = require('crypto');
const eventId = '<event_id>';
const timestamp = '<timestamp_from_db>';
const action = '<action>';
const outcome = '<outcome>';
const agentId = '<agent_id>';
const orgId = '<organization_id>';
const prevHash = '<previous_hash_from_db>';
const expected = crypto.createHash('sha256')
.update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
.digest('hex');
console.log('Expected hash:', expected);
console.log('Stored hash: <hash_from_db>');
console.log('Match:', expected === '<hash_from_db>');
"
```
Possible break types:
- **Hash mismatch only** — event data was modified after insertion
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
- **Both mismatched** — multiple modifications or an injection attack
### Step 3: Escalate
A chain break is a **critical security incident**. Immediately:
1. Notify the security team and CISO
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
4. Consider temporarily suspending API access pending investigation
5. Notify affected customers per data breach notification obligations
### Step 4: Forensic Investigation
Using PostgreSQL audit logs, Vault audit logs, and application logs:
- Identify which application process or database connection modified the row
- Correlate with access logs and authentication events
- Determine the extent of the compromise (single row vs. systematic)
---
## Verification Rate Limiting
`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
and poll `GET /compliance/controls` instead.
---
## SOC 2 Evidence Package
For auditors, provide:
1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
5. `docs/openapi/compliance.yaml` — endpoint specification