feat(phase-3): workstream 6 — SOC 2 Type II Preparation

Implements all 22 WS6 tasks completing Phase 3 Enterprise.

Column-level encryption (AES-256-CBC, Vault-backed key) via EncryptionService
applied to credentials.secret_hash, credentials.vault_path,
webhook_subscriptions.vault_secret_path, and agent_did_keys.vault_key_path.
Backward-compatible: isEncrypted() guard skips decryption for existing
plaintext rows until next read-write cycle.

Audit chain integrity (CC7.2): AuditRepository computes SHA-256 Merkle hash
on every INSERT (hash = SHA-256(eventId+timestamp+action+outcome+agentId+orgId+prevHash)).
AuditVerificationService walks the full chain verifying hash continuity.
AuditChainVerificationJob runs hourly; sets agentidp_audit_chain_integrity
Prometheus gauge to 1 (pass) or 0 (fail).

TLS enforcement (CC6.7): TLSEnforcementMiddleware registered as first
middleware in Express stack; 301 redirect on non-https X-Forwarded-Proto
in production.

SecretsRotationJob (CC9.2): hourly scan for credentials expiring within 7
days; increments agentidp_credentials_expiring_soon_total.

ComplianceController + routes: GET /audit/verify (auth+audit:read scope,
30/min rate-limit); GET /compliance/controls (public, Cache-Control 60s).
ComplianceStatusStore: module-level map updated by jobs, consumed by controller.

Prometheus: 2 new metrics (agentidp_credentials_expiring_soon_total,
agentidp_audit_chain_integrity); 6 alerting rules in alerts.yml.

Compliance docs: soc2-controls-matrix.md, encryption-runbook.md,
audit-log-runbook.md, incident-response.md, secrets-rotation.md.

Tests: 557 unit tests passing (35 suites); 26 new tests (EncryptionService,
AuditVerificationService); 19 compliance integration tests. TypeScript clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-03-31 00:41:53 +00:00
parent 272b69f18d
commit fd90b2acd1
35 changed files with 3715 additions and 26 deletions

View File

@@ -0,0 +1,172 @@
# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC7.2 — Audit Log Integrity
**Service:** `src/services/AuditVerificationService.ts`
**Job:** `src/jobs/AuditChainVerificationJob.ts`
**Endpoint:** `GET /api/v1/audit/verify`
---
## Overview
Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
via a SHA-256 hash chain. Each event stores:
- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)
The first event in the chain uses `previous_hash = ''` (empty string sentinel).
A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
on `audit_events`, making the log tamper-evident at the database level.
---
## Running GET /audit/verify
### Full chain verification (no date range)
```bash
# Requires Bearer token with audit:read scope
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify"
```
**Response (chain intact):**
```json
{
"verified": true,
"checkedCount": 18504,
"brokenAtEventId": null
}
```
**Response (chain break detected):**
```json
{
"verified": false,
"checkedCount": 1203,
"brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
}
```
### Date-ranged verification
```bash
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
```
### Interpreting the response
| Field | Meaning |
|---|---|
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
| `checkedCount` | Number of events examined (0 = no events in range) |
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |
---
## AuditChainVerificationJob
The `AuditChainVerificationJob` runs automatically in the background every hour (default).
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).
On each tick it calls `verifyChain()` and:
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
- Updates `ComplianceStatusStore` with `CC7.2 = passing`
If verification fails:
- Sets gauge to **0**
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`
---
## What to Do When `brokenAtEventId` is Returned
### Step 1: Preserve Evidence
Immediately capture the full state of the audit log for forensic analysis:
```sql
-- Export all events around the break point
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
FROM audit_events
WHERE timestamp >= (
SELECT timestamp - INTERVAL '1 hour'
FROM audit_events WHERE event_id = '<brokenAtEventId>'
)
ORDER BY timestamp ASC, event_id ASC;
```
Save the output to a secure, immutable location (e.g. S3 with object locking).
### Step 2: Identify the Break Type
Compare the recomputed hash for the broken event with its stored hash:
```bash
# Using Node.js
node -e "
const crypto = require('crypto');
const eventId = '<event_id>';
const timestamp = '<timestamp_from_db>';
const action = '<action>';
const outcome = '<outcome>';
const agentId = '<agent_id>';
const orgId = '<organization_id>';
const prevHash = '<previous_hash_from_db>';
const expected = crypto.createHash('sha256')
.update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
.digest('hex');
console.log('Expected hash:', expected);
console.log('Stored hash: <hash_from_db>');
console.log('Match:', expected === '<hash_from_db>');
"
```
Possible break types:
- **Hash mismatch only** — event data was modified after insertion
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
- **Both mismatched** — multiple modifications or an injection attack
### Step 3: Escalate
A chain break is a **critical security incident**. Immediately:
1. Notify the security team and CISO
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
4. Consider temporarily suspending API access pending investigation
5. Notify affected customers per data breach notification obligations
### Step 4: Forensic Investigation
Using PostgreSQL audit logs, Vault audit logs, and application logs:
- Identify which application process or database connection modified the row
- Correlate with access logs and authentication events
- Determine the extent of the compromise (single row vs. systematic)
---
## Verification Rate Limiting
`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
and poll `GET /compliance/controls` instead.
---
## SOC 2 Evidence Package
For auditors, provide:
1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
5. `docs/openapi/compliance.yaml` — endpoint specification

View File

@@ -0,0 +1,159 @@
# Encryption Key Rotation Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC6.1 — Encryption at Rest
**Service:** `src/services/EncryptionService.ts`
**Vault path:** Configured via `ENCRYPTION_KEY_VAULT_PATH` env var (default: `secret/data/agentidp/encryption-key`)
---
## Overview
AgentIdP uses AES-256-CBC column-level encryption for sensitive PostgreSQL columns.
The encryption key is a 64-character hex string (32 bytes) stored in HashiCorp Vault.
The `EncryptionService` fetches the key once and caches it in process memory.
Encrypted format: `base64(IV):base64(ciphertext)` where IV is 16 random bytes per encryption call.
---
## Key Rotation Procedure
### Prerequisites
- Access to HashiCorp Vault with write permissions to the encryption key path
- Access to the production application environment (to trigger restart)
- At least one backup of the current key stored securely offline
### Step 1: Generate a New Key
Generate a cryptographically strong 32-byte (64-character hex) key:
```bash
openssl rand -hex 32
# Example output: a1b2c3d4e5f6... (64 hex chars)
```
Record the new key securely.
### Step 2: Backup the Current Key
Before overwriting, read and securely store the current key:
```bash
vault kv get -field=encryptionKey secret/agentidp/encryption-key > /secure/backup/encryption-key-$(date +%Y%m%d).txt
```
Store in a hardware security module (HSM) or offline key store.
### Step 3: Write the New Key to Vault
```bash
vault kv put secret/agentidp/encryption-key encryptionKey="<new-64-char-hex-key>"
```
Verify the write:
```bash
vault kv get secret/agentidp/encryption-key
```
Confirm the `encryptionKey` field contains exactly 64 hex characters.
### Step 4: Restart the Application
The `EncryptionService` caches the key in process memory. A restart forces a re-fetch from Vault:
```bash
# Kubernetes rolling restart
kubectl rollout restart deployment/agentidp
# Docker Compose
docker-compose restart agentidp
# PM2
pm2 restart agentidp
```
### Step 5: Verify Key Pick-Up
Check the application logs for:
```
[AgentIdP] EncryptionService enabled — sensitive columns encrypted at rest (SOC 2 CC6.1)
```
Call the compliance controls endpoint to confirm the control is passing:
```bash
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
```
Expected output:
```json
{ "id": "CC6.1", "name": "Encryption at Rest", "status": "passing", "lastChecked": "..." }
```
### Step 6: Re-encryption of Existing Rows
Existing rows encrypted with the old key will fail to decrypt after key rotation.
Re-encryption happens lazily: the next time each row is read and re-written (e.g. credential rotation,
webhook update), the application will decrypt with the old key and re-encrypt with the new one.
For immediate full re-encryption, use the re-encryption script:
```bash
# Run the re-encryption migration script (reads old key from backup, encrypts with new key)
# Note: This script requires both old and new keys to be available
ts-node scripts/reencrypt-columns.ts --old-key-file /secure/backup/encryption-key-<date>.txt
```
---
## Emergency Rollback
If the new key causes issues (e.g. test failures, decryption errors), roll back:
### Step 1: Restore Old Key to Vault
```bash
vault kv put secret/agentidp/encryption-key encryptionKey="<old-64-char-hex-key-from-backup>"
```
### Step 2: Restart the Application
```bash
kubectl rollout restart deployment/agentidp
```
### Step 3: Verify Recovery
```bash
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
```
### Step 4: Investigate Root Cause
Review application logs for `AES-256-CBC decryption failed` errors and audit the cause before
reattempting rotation.
---
## Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| `Invalid encryption key ... expected a 64-character hex string` | Key in Vault is wrong length or encoding | Re-write correct key to Vault, restart |
| `AES-256-CBC decryption failed — possible key mismatch` | Key rotated but rows still encrypted with old key | Rollback to old key, then migrate properly |
| `CC6.1` status shows `unknown` | Vault unreachable, key fetch failed | Check Vault connectivity, `VAULT_ADDR`, `VAULT_TOKEN` |
---
## Audit Evidence
After rotation, record the following for SOC 2 evidence:
- Date of rotation
- Who performed the rotation (approver + executor)
- Vault audit log entry confirming the key write
- Application log confirming EncryptionService initialised with new key
- `GET /compliance/controls` response showing CC6.1 = passing

View File

@@ -0,0 +1,229 @@
# Incident Response Runbook — SentryAgent.ai AgentIdP
**Owner:** Security Engineering
**Last updated:** 2026-03-31
**Applies to:** Production AgentIdP deployments
This runbook covers the four incident types most relevant to SOC 2 Type II compliance monitoring.
---
## 1. Auth Failure Spike
### Detection
**Prometheus alert:** `AuthFailureSpike`
```yaml
expr: rate(agentidp_http_requests_total{status_code="401"}[5m]) > 0.5
for: 2m
severity: warning
```
Triggers when the rate of HTTP 401 responses exceeds 0.5 per second sustained over 2 minutes.
### Immediate Actions
1. Acknowledge the alert in PagerDuty / alerting system
2. Check whether the spike correlates with a scheduled process (e.g. batch agent key rotation, deployment)
3. Check Prometheus dashboard for the geographic distribution of the failing requests
### Investigation Steps
1. **Identify source agents:**
```bash
# Query audit log for recent auth failures
curl -s -H "Authorization: Bearer <admin-token>" \
"https://api.sentryagent.ai/v1/audit?action=auth.failed&limit=100"
```
2. **Check for brute-force patterns:**
Look for repeated failures from the same `client_id` or IP address.
3. **Check if an agent's credentials expired:**
```bash
# Look for expired credentials
psql "$DATABASE_URL" -c "
SELECT credential_id, client_id, expires_at
FROM credentials
WHERE status = 'active' AND expires_at < NOW()
ORDER BY expires_at DESC LIMIT 20;"
```
4. **Check for key compromise signals:**
- Multiple agents failing simultaneously → possible key store issue
- Single agent with high failure rate → possible credential stuffing or misconfiguration
### Escalation Path
- **Warning (< 2 req/s):** Engineering on-call investigates within 1 hour
- **Critical (> 2 req/s sustained):** CISO notified, potential account compromise investigation
- **If credential compromise confirmed:** Revoke affected credentials immediately via `POST /agents/:id/credentials/:credId/revoke`
---
## 2. Anomalous Token Issuance
### Detection
**Prometheus alert:** `AnomalousTokenIssuance`
```yaml
expr: rate(agentidp_tokens_issued_total[5m]) > 10
for: 5m
severity: warning
```
Triggers when token issuance rate exceeds 10 per second for 5 continuous minutes.
### Immediate Actions
1. Acknowledge the alert
2. Determine if a legitimate mass-scale operation is underway (e.g. new customer onboarding, load test)
3. Check the `scope` label breakdown on `agentidp_tokens_issued_total` to identify what scopes are being requested
### Investigation Steps
1. **Identify top issuing agents:**
```bash
# Query audit log for recent token issuances
curl -s -H "Authorization: Bearer <admin-token>" \
"https://api.sentryagent.ai/v1/audit?action=token.issued&limit=100"
```
2. **Check monthly token budget:**
Each agent is limited to 10,000 tokens/month (free tier). A single agent hitting the limit may indicate automation abuse.
3. **Check for abnormal scope combinations:**
If tokens are being issued with `admin:orgs` or `audit:read` at high volume, this warrants immediate investigation.
4. **Check for valid business reason:**
Contact the organization owner for the top-issuing agents.
### Escalation Path
- **Warning:** Engineering on-call investigates within 4 hours
- **If compromise suspected:** Revoke affected agent tokens via Redis revocation list, rotate credentials
- **If systematic abuse confirmed:** Suspend the issuing agent(s) via `PATCH /agents/:id` with `status: suspended`
---
## 3. Audit Chain Integrity Failure
### Detection
**Prometheus alert:** `AuditChainIntegrityFailed`
```yaml
expr: agentidp_audit_chain_integrity == 0
for: 0m
severity: critical
```
Fires immediately when `AuditChainVerificationJob` detects a break in the audit event hash chain.
This is a **CRITICAL** security event — possible evidence of log tampering.
### Immediate Actions
1. **Do NOT attempt to repair the broken chain** — preserve all evidence
2. Notify CISO and security team immediately
3. Page the on-call security engineer with P0 priority
4. Capture the current state:
```bash
curl -s -H "Authorization: Bearer <audit-token>" \
"https://api.sentryagent.ai/v1/audit/verify" | tee /secure/incident-$(date +%Y%m%d-%H%M).json
```
### Investigation Steps
1. **Determine the broken event:**
The `brokenAtEventId` field in the `/audit/verify` response identifies the first broken event.
2. **Forensic analysis:**
Follow the steps in `docs/compliance/audit-log-runbook.md` — "What to Do When brokenAtEventId is Returned".
3. **Check database access logs:**
Review PostgreSQL `pg_stat_activity` and connection logs for unauthorized direct DB access.
4. **Check application logs:**
Look for any errors from the immutability trigger (`audit_events_immutable`).
5. **Check Vault audit logs:**
Review whether any encryption key access was abnormal.
### Escalation Path
- **Immediate:** CISO + Legal + Security Engineering
- **Within 1 hour:** Begin forensic preservation per incident response plan
- **Within 24 hours:** Determine scope of compromise and notification obligations
- **Customer notification:** Per contractual and regulatory obligations (GDPR, SOC 2 requirements)
---
## 4. Webhook Dead-Letter Accumulation
### Detection
**Prometheus alert:** `WebhookDeadLetterAccumulating`
```yaml
expr: increase(agentidp_webhook_dead_letters_total[1h]) > 10
for: 0m
severity: critical
```
Fires when more than 10 webhook deliveries reach dead-letter status within an hour.
### Immediate Actions
1. Acknowledge the alert
2. Check which `organization_id` labels are accumulating dead-letters:
```bash
# Prometheus query: top organizations by dead-letter rate
# agentidp_webhook_dead_letters_total (by organization_id)
```
3. Check if the destination endpoints are reachable:
```bash
curl -I https://<webhook-destination-url>/
```
### Investigation Steps
1. **List affected webhook subscriptions:**
```bash
# Query delivery records for dead-letter status
psql "$DATABASE_URL" -c "
SELECT s.id, s.organization_id, s.url, COUNT(d.id) AS dead_letters
FROM webhook_subscriptions s
JOIN webhook_deliveries d ON d.subscription_id = s.id
WHERE d.status = 'dead_letter'
AND d.updated_at > NOW() - INTERVAL '2 hours'
GROUP BY s.id
ORDER BY dead_letters DESC
LIMIT 20;"
```
2. **Check delivery failure reasons:**
```bash
psql "$DATABASE_URL" -c "
SELECT http_status_code, COUNT(*) as count
FROM webhook_deliveries
WHERE status = 'dead_letter'
AND updated_at > NOW() - INTERVAL '2 hours'
GROUP BY http_status_code;"
```
3. **Common causes and resolutions:**
| HTTP Status | Likely Cause | Resolution |
|---|---|---|
| 0 / null | Network unreachable / DNS failure | Check recipient endpoint availability |
| 401 / 403 | HMAC signature validation failing | Customer to verify HMAC secret |
| 404 | Endpoint URL changed | Customer to update webhook URL |
| 5xx | Recipient server error | Customer to investigate their endpoint |
| Timeout | Slow recipient endpoint | Customer to optimize endpoint response time |
4. **Notify affected customers:**
Contact the organization owner for high-volume dead-letter subscriptions.
### Escalation Path
- **Warning (10-50/hr):** Engineering notifies affected customers, investigates endpoint health
- **Critical (> 50/hr):** Engineering on-call + Platform reliability team engaged
- **If systemic delivery infrastructure failure:** Activate incident bridge, escalate to VP Engineering

View File

@@ -0,0 +1,142 @@
# Secrets Rotation Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC9.2 — Secrets Rotation
**Last updated:** 2026-03-31
---
## Overview
AgentIdP manages three categories of secrets that require periodic rotation:
1. **Agent client secrets** — Per-credential client secrets used for OAuth 2.0 token issuance
2. **OIDC signing keys** — RSA/EC keys used to sign ID tokens
3. **AES-256-CBC encryption key** — Column-level database encryption key (see `encryption-runbook.md`)
---
## 1. Agent Credential (Client Secret) Rotation
### API endpoint
```
POST /api/v1/agents/:agentId/credentials/:credentialId/rotate
```
Requires Bearer token with `agents:write` scope.
### Procedure
```bash
# 1. List active credentials for the agent
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials?status=active"
# 2. Rotate the credential (generate new secret)
curl -s -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"expiresAt": "2027-03-31T00:00:00.000Z"}' \
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials/<credentialId>/rotate"
# Response includes the new clientSecret — store it immediately; it is never shown again
```
### Key points
- The new `clientSecret` is returned **once only** — store it securely before the response is discarded
- The agent's previous secret is immediately invalidated (Vault KV v2 version overwritten)
- An audit event `credential.rotated` is logged to the immutable audit chain
- A `credential.rotated` webhook event is dispatched to all active subscriptions
### Recommended rotation schedule
| Credential type | Recommended rotation interval |
|---|---|
| Production agent credentials | 90 days |
| Staging / development credentials | 180 days |
| Service account credentials | 365 days (annual) |
| Credentials involved in a security incident | Immediately |
### Automated expiry detection
`SecretsRotationJob` runs hourly and queries credentials expiring within 7 days.
Prometheus alert `CredentialExpiryApproaching` fires immediately when any are detected.
Respond to this alert by rotating the flagged credential(s) before the expiry date.
---
## 2. OIDC Signing Key Rotation
### Overview
OIDC signing keys are managed by `OIDCKeyService` (`src/services/OIDCKeyService.ts`).
Keys are stored in the `oidc_keys` PostgreSQL table. The current active key is used to
sign all new ID tokens; public keys are exposed via `GET /.well-known/jwks.json`.
### When to rotate
- Key compromise or suspected exposure
- Scheduled rotation (recommended every 90 days for production)
- Algorithm upgrade (e.g. RS256 → ES256)
### Rotation procedure
OIDC key rotation is handled automatically by `OIDCKeyService.ensureCurrentKey()`:
```bash
# Force generation of a new signing key by calling the internal rotate endpoint
# (or trigger by redeploying with OIDC_FORCE_KEY_ROTATION=true)
# 1. Mark current key as inactive (if manual rotation is required)
psql "$DATABASE_URL" -c "
UPDATE oidc_keys
SET active = false
WHERE active = true;"
# 2. Restart the application — ensureCurrentKey() will generate a new key on startup
kubectl rollout restart deployment/agentidp
```
### JWKS update behavior
- Old public keys remain in `GET /.well-known/jwks.json` for **24 hours** after rotation
(grace period for in-flight tokens)
- After the grace period, old keys are removed from the JWKS endpoint
- Redis JWKS cache TTL is configured by `JWKS_CACHE_TTL_SECONDS` (default: 3600)
### Impact on existing tokens
Existing valid tokens signed with the old key **continue to work** until they expire,
as long as the old public key remains in JWKS. After the grace period, old tokens
will fail verification.
---
## 3. Encryption Key Rotation
See `docs/compliance/encryption-runbook.md` for the full AES-256-CBC encryption key rotation procedure.
**Summary:** Generate new 32-byte hex key → write to Vault at `ENCRYPTION_KEY_VAULT_PATH` → restart app → existing rows re-encrypted lazily on next read-write cycle.
---
## Schedule Recommendations
| Secret Type | Production Interval | Staging Interval | Trigger for Immediate Rotation |
|---|---|---|---|
| Agent client secrets | 90 days | 180 days | Credential suspected compromised |
| OIDC signing keys | 90 days | 180 days | Key file exposed, algorithm upgrade |
| AES-256-CBC encryption key | 365 days (annual) | On demand | Key exposed, Vault breach, compliance audit requirement |
| Webhook HMAC secrets | Per customer policy | N/A | Webhook endpoint compromised |
---
## Compliance Evidence
For SOC 2 CC9.2 evidence collection:
- Prometheus metric history: `agentidp_credentials_expiring_soon_total`
- Audit log entries with `action: credential.rotated` — query via `GET /audit?action=credential.rotated`
- Key rotation records from Vault audit log
- This runbook + sign-off from Security Engineering

View File

@@ -0,0 +1,42 @@
# SOC 2 Type II Controls Matrix — SentryAgent.ai AgentIdP
This document maps the five in-scope SOC 2 Trust Services Criteria (TSC) controls to their
corresponding implementation artefacts, mechanisms, and automated verification methods.
---
## Controls Matrix
| Control ID | TSC Criterion Name | Implementation File | Mechanism | Automated Check |
|---|---|---|---|---|
| **CC6.1** | Encryption at Rest | `src/services/EncryptionService.ts` | AES-256-CBC column-level encryption on `credentials.secret_hash`, `credentials.vault_path`, `webhook_subscriptions.vault_secret_path`, `agent_did_keys.vault_key_path`. Key is stored in HashiCorp Vault KV v2 at path configured by `ENCRYPTION_KEY_VAULT_PATH`. IV is randomised per encryption call. Backward-compat: `isEncrypted()` gate allows plaintext rows to coexist during migration. | `GET /api/v1/compliance/controls` returns `CC6.1` status. Status is set to `passing` on service startup when `EncryptionService` initialises. |
| **CC6.7** | TLS Enforcement | `src/middleware/TLSEnforcementMiddleware.ts` | Express middleware registered as the **first** middleware in the app stack (before all routes and body parsers). In `NODE_ENV=production`, checks `X-Forwarded-Proto` header set by the upstream load balancer/reverse proxy. Any non-HTTPS request receives a `301 Moved Permanently` redirect to `https://`. | `GET /api/v1/compliance/controls` returns `CC6.7` status. TLS enforcement is a static configuration control; status is set to `passing` on application startup. |
| **CC7.2** | Audit Log Integrity | `src/services/AuditVerificationService.ts`, `src/repositories/AuditRepository.ts`, `src/jobs/AuditChainVerificationJob.ts` | Each audit event (`audit_events` table) stores a `hash` (SHA-256 of `eventId + timestamp + action + outcome + agentId + organizationId + previousHash`) and `previous_hash` linking it to the prior event. An immutability trigger prevents UPDATE/DELETE on `audit_events`. `AuditChainVerificationJob` re-walks the entire chain every hour. | Prometheus gauge `agentidp_audit_chain_integrity` (1 = passing, 0 = failing). Prometheus alert `AuditChainIntegrityFailed` fires when gauge = 0. `GET /api/v1/audit/verify` triggers an on-demand verification. `GET /api/v1/compliance/controls` returns `CC7.2` status. |
| **CC9.2** | Secrets Rotation | `src/jobs/SecretsRotationJob.ts` | `SecretsRotationJob` runs every hour (configurable via `SECRETS_ROTATION_CHECK_INTERVAL_MS`) and queries `credentials` for `active` credentials expiring within 7 days. For each, it increments the `agentidp_credentials_expiring_soon_total` Prometheus counter with the owning `agent_id`. Operators are expected to act on the alert within the 7-day window. | Prometheus counter `agentidp_credentials_expiring_soon_total` per `agent_id`. Prometheus alert `CredentialExpiryApproaching` fires when any increase is detected. `GET /api/v1/compliance/controls` returns `CC9.2` status. |
| **CC7.1** | Webhook Dead-Letter Monitoring | `src/workers/WebhookDeliveryWorker.ts` | `WebhookDeliveryWorker` processes webhook deliveries from a Redis queue. After exhausting all retry attempts (configurable `WEBHOOK_MAX_RETRIES`), the delivery is moved to dead-letter status and `agentidp_webhook_dead_letters_total` is incremented. | Prometheus counter `agentidp_webhook_dead_letters_total` per `organization_id`. Prometheus alert `WebhookDeadLetterAccumulating` fires when > 10 dead-letters accumulate in 1 hour. `GET /api/v1/compliance/controls` returns `CC7.1` status. |
---
## Evidence Collection
For a SOC 2 Type II audit, the following evidence should be collected:
| Evidence Type | Collection Method |
|---|---|
| Encryption at rest configuration | Export Vault KV v2 policy + `_encryption_migration_log` table contents |
| TLS certificate and enforcement logs | Load balancer access logs + `X-Forwarded-Proto` middleware responses |
| Audit chain integrity report | `GET /api/v1/audit/verify` with full date range |
| Secrets rotation compliance | Prometheus metric history for `agentidp_credentials_expiring_soon_total` |
| Webhook dead-letter rate | Prometheus metric history for `agentidp_webhook_dead_letters_total` |
| Immutable audit log dump | Direct PostgreSQL export of `audit_events` table with hash verification |
---
## References
- SOC 2 Trust Services Criteria: [AICPA TSC 2017](https://www.aicpa.org/resources/article/trust-services-criteria)
- OpenAPI spec: `docs/openapi/compliance.yaml`
- Encryption runbook: `docs/compliance/encryption-runbook.md`
- Audit log runbook: `docs/compliance/audit-log-runbook.md`
- Incident response: `docs/compliance/incident-response.md`
- Secrets rotation: `docs/compliance/secrets-rotation.md`

View File

@@ -0,0 +1,548 @@
openapi: 3.0.3
info:
title: SentryAgent.ai — Compliance & SOC 2 Type II Service
version: 1.0.0
description: |
The Compliance Service exposes endpoints supporting SentryAgent.ai's
**SOC 2 Type II** audit readiness programme.
Two categories of control are surfaced:
**Audit chain verification** (`GET /audit/verify`) — Confirms cryptographic
integrity of the immutable audit log chain across an optional date range.
This endpoint provides auditors and compliance tooling with a single call to
assert that no audit events have been tampered with, deleted, or reordered
after initial capture.
**SOC 2 control status** (`GET /compliance/controls`) — Returns a live status
snapshot for each of the five in-scope SOC 2 Trust Services Criteria controls
monitored by the platform. Designed as a lightweight, public health-style
endpoint so that monitoring infrastructure can poll without bearer credentials.
**In-scope SOC 2 controls:**
| Control ID | Name | Description |
|------------|------|-------------|
| `CC6.1` | Encryption at Rest | Verifies database and secrets store encryption is active |
| `CC6.7` | TLS Enforcement | Confirms TLS 1.2+ is enforced on all inbound connections |
| `CC7.2` | Audit Log Integrity | Validates audit chain hash continuity |
| `CC9.2` | Secrets Rotation | Checks that all managed secrets are within rotation policy |
| `CC7.1` | Webhook Dead-Letter Monitoring | Asserts dead-letter queue depth is within threshold |
**Required scope (audit chain verify only):** `audit:read`
servers:
- url: http://localhost:3000/api/v1
description: Local development server
- url: https://api.sentryagent.ai/v1
description: Production server
tags:
- name: Audit Chain
description: Cryptographic integrity verification of the immutable audit event chain
- name: Compliance Controls
description: SOC 2 Type II control status — public health-style monitoring endpoint
components:
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
description: |
JWT access token with `audit:read` scope, obtained via `POST /token`.
Include as: `Authorization: Bearer <token>`
schemas:
ChainVerificationResult:
type: object
description: |
Result of an audit event chain integrity verification run.
The audit log is structured as a hash-linked chain. Each event stores a
reference to the hash of the preceding event. `verified: true` means every
event in the requested window was checked and no breaks in the chain were
detected.
When `verified` is `false`, `brokenAtEventId` identifies the first event
where the chain integrity check failed, enabling targeted forensic investigation.
required:
- verified
- checkedCount
- brokenAtEventId
properties:
verified:
type: boolean
description: >
`true` if every audit event in the checked range maintains an unbroken
cryptographic hash chain; `false` if at least one chain break was detected.
example: true
checkedCount:
type: integer
description: Total number of audit events examined during this verification run.
minimum: 0
example: 2847
brokenAtEventId:
type: string
format: uuid
nullable: true
description: >
UUID of the first audit event where chain continuity failed, or `null`
when `verified` is `true`. Only the first detected break is reported;
subsequent events are not checked after a break is found.
example: null
fromDate:
type: string
format: date-time
description: >
The ISO 8601 lower bound of the date range that was verified.
Present only when a `fromDate` query parameter was supplied.
example: "2026-03-01T00:00:00.000Z"
toDate:
type: string
format: date-time
description: >
The ISO 8601 upper bound of the date range that was verified.
Present only when a `toDate` query parameter was supplied.
example: "2026-03-31T23:59:59.999Z"
ControlStatus:
type: string
description: Operational status of a SOC 2 control at the time of the last check.
enum:
- passing
- failing
- unknown
example: passing
ComplianceControl:
type: object
description: Status record for a single SOC 2 Trust Services Criteria control.
required:
- id
- name
- status
- lastChecked
properties:
id:
type: string
description: SOC 2 Trust Services Criteria control identifier.
enum:
- CC6.1
- CC6.7
- CC7.2
- CC9.2
- CC7.1
example: "CC6.1"
name:
type: string
description: Human-readable name of the control.
example: "Encryption at Rest"
status:
$ref: '#/components/schemas/ControlStatus'
lastChecked:
type: string
format: date-time
description: ISO 8601 timestamp of the most recent automated check for this control.
example: "2026-03-31T06:00:00.000Z"
ComplianceControlsResponse:
type: object
description: SOC 2 compliance control status summary for all in-scope controls.
required:
- controls
properties:
controls:
type: array
description: Status record for each of the five in-scope SOC 2 controls.
minItems: 5
maxItems: 5
items:
$ref: '#/components/schemas/ComplianceControl'
example:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
ErrorResponse:
type: object
description: Standard error response envelope used across all SentryAgent.ai APIs.
required:
- code
- message
properties:
code:
type: string
description: Machine-readable error code.
example: "UNAUTHORIZED"
message:
type: string
description: Human-readable description of the error.
example: "A valid Bearer token is required."
details:
type: object
description: Optional structured details providing additional context.
additionalProperties: true
example: {}
responses:
Unauthorized:
description: Missing or invalid Bearer token.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "A valid Bearer token is required to access this resource."
Forbidden:
description: Valid token but insufficient permissions. Requires `audit:read` scope.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "The 'audit:read' scope is required to verify the audit chain."
TooManyRequests:
description: |
Rate limit exceeded. Retry after the reset time indicated in `X-RateLimit-Reset`.
headers:
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute.
example: 30
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current window.
example: 0
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMIT_EXCEEDED"
message: "Too many requests. Please retry after the rate limit window resets."
InternalServerError:
description: Unexpected server error.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INTERNAL_SERVER_ERROR"
message: "An unexpected error occurred. Please try again later."
paths:
/audit/verify:
get:
operationId: verifyAuditChain
tags:
- Audit Chain
summary: Verify audit log chain integrity
description: |
Triggers a full integrity verification pass over the immutable audit event
chain. Each event in the log contains a cryptographic hash of the previous
event; this endpoint traverses the chain and confirms no breaks exist.
**Use cases:**
- Auditor evidence collection for SOC 2 Type II assessment
- Continuous compliance monitoring (cron-driven)
- Incident response — confirm audit log has not been tampered with
**Requires:** Bearer token with `audit:read` scope.
**Rate limit:** 30 requests/minute per `client_id`. Audit chain verification
is a computationally intensive operation and is rate-limited more aggressively
than standard read endpoints. For continuous monitoring, poll no more than
once per minute.
**Date range filtering:** Supply `fromDate` and/or `toDate` to restrict
verification to a specific window. When omitted, the entire retained audit
log is verified. `fromDate` must be before or equal to `toDate` when both
are provided.
**Result interpretation:**
- `verified: true` — chain is intact across all checked events
- `verified: false` — at least one chain break detected; `brokenAtEventId`
identifies the first affected event
security:
- BearerAuth: []
parameters:
- name: fromDate
in: query
description: |
ISO 8601 date-time lower bound for the verification window (inclusive).
When omitted, verification starts from the earliest available audit event.
Must be before or equal to `toDate` when both are supplied.
required: false
schema:
type: string
format: date-time
example: "2026-03-01T00:00:00.000Z"
- name: toDate
in: query
description: |
ISO 8601 date-time upper bound for the verification window (inclusive).
When omitted, verification runs up to and including the most recent
audit event. Must be after or equal to `fromDate` when both are supplied.
required: false
schema:
type: string
format: date-time
example: "2026-03-31T23:59:59.999Z"
responses:
'200':
description: |
Audit chain verification completed. Inspect `verified` to determine
whether chain integrity is intact. A `200` is returned regardless of
whether verification passed or failed — check the response body.
headers:
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute for this endpoint.
example: 30
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current rate limit window.
example: 29
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ChainVerificationResult'
examples:
chainIntact:
summary: Verification passed — chain is intact
value:
verified: true
checkedCount: 2847
brokenAtEventId: null
fromDate: "2026-03-01T00:00:00.000Z"
toDate: "2026-03-31T23:59:59.999Z"
chainBroken:
summary: Verification failed — chain break detected
value:
verified: false
checkedCount: 1203
brokenAtEventId: "c4d5e6f7-a8b9-0123-cdef-456789012345"
fromDate: "2026-03-01T00:00:00.000Z"
toDate: "2026-03-31T23:59:59.999Z"
noDateRange:
summary: Full log verified (no date range supplied)
value:
verified: true
checkedCount: 18504
brokenAtEventId: null
'400':
description: Invalid query parameter value or date range.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalidFromDate:
summary: fromDate is not a valid ISO 8601 date-time
value:
code: "VALIDATION_ERROR"
message: "Invalid query parameter value."
details:
field: "fromDate"
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-01T00:00:00.000Z)."
invalidToDate:
summary: toDate is not a valid ISO 8601 date-time
value:
code: "VALIDATION_ERROR"
message: "Invalid query parameter value."
details:
field: "toDate"
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-31T23:59:59.999Z)."
invalidDateRange:
summary: fromDate is after toDate
value:
code: "VALIDATION_ERROR"
message: "Invalid date range."
details:
reason: "fromDate must be before or equal to toDate."
'401':
$ref: '#/components/responses/Unauthorized'
'403':
$ref: '#/components/responses/Forbidden'
'429':
$ref: '#/components/responses/TooManyRequests'
'500':
$ref: '#/components/responses/InternalServerError'
/compliance/controls:
get:
operationId: getComplianceControls
tags:
- Compliance Controls
summary: Get SOC 2 control status summary
description: |
Returns a live status snapshot for each of the five in-scope SOC 2 Type II
Trust Services Criteria controls monitored by the SentryAgent.ai platform.
**No authentication required.** This endpoint is intentionally public
(analogous to a health check) so that external monitoring infrastructure,
status pages, and audit tooling can poll it without bearer credentials.
**Controls monitored:**
| Control ID | Name | What is checked |
|------------|------|-----------------|
| `CC6.1` | Encryption at Rest | Database and secrets store encryption is active and configured |
| `CC6.7` | TLS Enforcement | TLS 1.2+ is enforced on all platform inbound connections |
| `CC7.2` | Audit Log Integrity | Audit chain hash continuity — shorthand of `/audit/verify` |
| `CC9.2` | Secrets Rotation | All managed secrets are within the rotation policy window |
| `CC7.1` | Webhook Dead-Letter Monitoring | Dead-letter queue depth is within the acceptable threshold |
**Status values:**
- `passing` — control is operating within policy
- `failing` — control has breached policy; immediate attention required
- `unknown` — automated check could not complete (e.g. dependency unavailable)
**Caching note:** Responses may be cached for up to 60 seconds by
intermediate proxies. The `lastChecked` field on each control indicates
the timestamp of the most recent automated evaluation.
**Rate limit:** 120 requests/minute per IP address.
security: []
responses:
'200':
description: SOC 2 control status summary returned successfully.
headers:
Cache-Control:
schema:
type: string
description: >
Downstream caches may serve this response for up to 60 seconds.
example: "public, max-age=60"
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute for this endpoint.
example: 120
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current rate limit window.
example: 119
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ComplianceControlsResponse'
examples:
allPassing:
summary: All controls passing
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
oneControlFailing:
summary: One control failing (secrets rotation overdue)
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "failing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
unknownControl:
summary: One control in unknown state (dependency unavailable)
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "unknown"
lastChecked: "2026-03-31T05:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
'429':
$ref: '#/components/responses/TooManyRequests'
'500':
$ref: '#/components/responses/InternalServerError'