feat(phase-3): workstream 6 — SOC 2 Type II Preparation
Implements all 22 WS6 tasks completing Phase 3 Enterprise. Column-level encryption (AES-256-CBC, Vault-backed key) via EncryptionService applied to credentials.secret_hash, credentials.vault_path, webhook_subscriptions.vault_secret_path, and agent_did_keys.vault_key_path. Backward-compatible: isEncrypted() guard skips decryption for existing plaintext rows until next read-write cycle. Audit chain integrity (CC7.2): AuditRepository computes SHA-256 Merkle hash on every INSERT (hash = SHA-256(eventId+timestamp+action+outcome+agentId+orgId+prevHash)). AuditVerificationService walks the full chain verifying hash continuity. AuditChainVerificationJob runs hourly; sets agentidp_audit_chain_integrity Prometheus gauge to 1 (pass) or 0 (fail). TLS enforcement (CC6.7): TLSEnforcementMiddleware registered as first middleware in Express stack; 301 redirect on non-https X-Forwarded-Proto in production. SecretsRotationJob (CC9.2): hourly scan for credentials expiring within 7 days; increments agentidp_credentials_expiring_soon_total. ComplianceController + routes: GET /audit/verify (auth+audit:read scope, 30/min rate-limit); GET /compliance/controls (public, Cache-Control 60s). ComplianceStatusStore: module-level map updated by jobs, consumed by controller. Prometheus: 2 new metrics (agentidp_credentials_expiring_soon_total, agentidp_audit_chain_integrity); 6 alerting rules in alerts.yml. Compliance docs: soc2-controls-matrix.md, encryption-runbook.md, audit-log-runbook.md, incident-response.md, secrets-rotation.md. Tests: 557 unit tests passing (35 suites); 26 new tests (EncryptionService, AuditVerificationService); 19 compliance integration tests. TypeScript clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
172
docs/compliance/audit-log-runbook.md
Normal file
172
docs/compliance/audit-log-runbook.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP
|
||||
|
||||
**Control:** SOC 2 CC7.2 — Audit Log Integrity
|
||||
**Service:** `src/services/AuditVerificationService.ts`
|
||||
**Job:** `src/jobs/AuditChainVerificationJob.ts`
|
||||
**Endpoint:** `GET /api/v1/audit/verify`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
|
||||
via a SHA-256 hash chain. Each event stores:
|
||||
|
||||
- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
|
||||
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)
|
||||
|
||||
The first event in the chain uses `previous_hash = ''` (empty string sentinel).
|
||||
|
||||
A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
|
||||
on `audit_events`, making the log tamper-evident at the database level.
|
||||
|
||||
---
|
||||
|
||||
## Running GET /audit/verify
|
||||
|
||||
### Full chain verification (no date range)
|
||||
|
||||
```bash
|
||||
# Requires Bearer token with audit:read scope
|
||||
curl -s -H "Authorization: Bearer <token>" \
|
||||
"https://api.sentryagent.ai/v1/audit/verify"
|
||||
```
|
||||
|
||||
**Response (chain intact):**
|
||||
```json
|
||||
{
|
||||
"verified": true,
|
||||
"checkedCount": 18504,
|
||||
"brokenAtEventId": null
|
||||
}
|
||||
```
|
||||
|
||||
**Response (chain break detected):**
|
||||
```json
|
||||
{
|
||||
"verified": false,
|
||||
"checkedCount": 1203,
|
||||
"brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
|
||||
}
|
||||
```
|
||||
|
||||
### Date-ranged verification
|
||||
|
||||
```bash
|
||||
curl -s -H "Authorization: Bearer <token>" \
|
||||
"https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
|
||||
```
|
||||
|
||||
### Interpreting the response
|
||||
|
||||
| Field | Meaning |
|
||||
|---|---|
|
||||
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
|
||||
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
|
||||
| `checkedCount` | Number of events examined (0 = no events in range) |
|
||||
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
|
||||
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |
|
||||
|
||||
---
|
||||
|
||||
## AuditChainVerificationJob
|
||||
|
||||
The `AuditChainVerificationJob` runs automatically in the background every hour (default).
|
||||
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).
|
||||
|
||||
On each tick it calls `verifyChain()` and:
|
||||
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
|
||||
- Updates `ComplianceStatusStore` with `CC7.2 = passing`
|
||||
|
||||
If verification fails:
|
||||
- Sets gauge to **0**
|
||||
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
|
||||
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
|
||||
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`
|
||||
|
||||
---
|
||||
|
||||
## What to Do When `brokenAtEventId` is Returned
|
||||
|
||||
### Step 1: Preserve Evidence
|
||||
|
||||
Immediately capture the full state of the audit log for forensic analysis:
|
||||
|
||||
```sql
|
||||
-- Export all events around the break point
|
||||
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
|
||||
FROM audit_events
|
||||
WHERE timestamp >= (
|
||||
SELECT timestamp - INTERVAL '1 hour'
|
||||
FROM audit_events WHERE event_id = '<brokenAtEventId>'
|
||||
)
|
||||
ORDER BY timestamp ASC, event_id ASC;
|
||||
```
|
||||
|
||||
Save the output to a secure, immutable location (e.g. S3 with object locking).
|
||||
|
||||
### Step 2: Identify the Break Type
|
||||
|
||||
Compare the recomputed hash for the broken event with its stored hash:
|
||||
|
||||
```bash
|
||||
# Using Node.js
|
||||
node -e "
|
||||
const crypto = require('crypto');
|
||||
const eventId = '<event_id>';
|
||||
const timestamp = '<timestamp_from_db>';
|
||||
const action = '<action>';
|
||||
const outcome = '<outcome>';
|
||||
const agentId = '<agent_id>';
|
||||
const orgId = '<organization_id>';
|
||||
const prevHash = '<previous_hash_from_db>';
|
||||
const expected = crypto.createHash('sha256')
|
||||
.update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
|
||||
.digest('hex');
|
||||
console.log('Expected hash:', expected);
|
||||
console.log('Stored hash: <hash_from_db>');
|
||||
console.log('Match:', expected === '<hash_from_db>');
|
||||
"
|
||||
```
|
||||
|
||||
Possible break types:
|
||||
- **Hash mismatch only** — event data was modified after insertion
|
||||
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
|
||||
- **Both mismatched** — multiple modifications or an injection attack
|
||||
|
||||
### Step 3: Escalate
|
||||
|
||||
A chain break is a **critical security incident**. Immediately:
|
||||
|
||||
1. Notify the security team and CISO
|
||||
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
|
||||
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
|
||||
4. Consider temporarily suspending API access pending investigation
|
||||
5. Notify affected customers per data breach notification obligations
|
||||
|
||||
### Step 4: Forensic Investigation
|
||||
|
||||
Using PostgreSQL audit logs, Vault audit logs, and application logs:
|
||||
- Identify which application process or database connection modified the row
|
||||
- Correlate with access logs and authentication events
|
||||
- Determine the extent of the compromise (single row vs. systematic)
|
||||
|
||||
---
|
||||
|
||||
## Verification Rate Limiting
|
||||
|
||||
`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
|
||||
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
|
||||
and poll `GET /compliance/controls` instead.
|
||||
|
||||
---
|
||||
|
||||
## SOC 2 Evidence Package
|
||||
|
||||
For auditors, provide:
|
||||
|
||||
1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
|
||||
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
|
||||
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
|
||||
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
|
||||
5. `docs/openapi/compliance.yaml` — endpoint specification
|
||||
Reference in New Issue
Block a user