chore(openspec): archive all completed changes, sync 14 new specs to library

Archived 4 completed OpenSpec changes (2026-04-02):
- phase-3-enterprise (100/100 tasks) — 6 Phase 3 capabilities synced
- devops-documentation (48/48 tasks) — 3 new + 1 merged capability
- bedroom-developer-docs (33/33 tasks) — 4 new capabilities synced
- engineering-docs (superseded by 2026-03-29 archive) — no tasks

Main spec library grows from 21 → 35 capabilities (+14 new):
federation, multi-tenancy, oidc, soc2, w3c-dids, webhooks,
database, operations, system-overview, api-reference, core-concepts,
developer-guides, quick-start + deployment (merged additive requirements)

Active changes: 0 — project board is clear for Phase 4 planning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-04-02 03:50:47 +00:00
parent ceec22f714
commit f1fbe0e29a
53 changed files with 3019 additions and 0 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-29

View File

@@ -0,0 +1,269 @@
# Phase 3: Enterprise — Technical Design
**Date**: 2026-03-29
**Author**: Virtual Architect
**Status**: Draft — pending CEO approval of proposal
---
## Architecture Overview
Phase 3 transforms AgentIdP from a single-tenant OAuth 2.0 server into a multi-tenant, W3C DID-issuing, OIDC-compliant, federated enterprise identity platform. The architecture remains monolithic Express (no microservices split) to avoid operational complexity, but clear service boundaries are enforced internally.
```
┌──────────────────────────────────────────────────────┐
│ AgentIdP Server (Express) │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Middleware Stack (ordered) │ │
│ │ TLS Enforcement → Auth → Org Context → OPA │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────┐ │
│ │ OrgSvc │ │ DIDSvc │ │OIDCSvc │ │FedSvc │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ WebhookQ │ │ SOC2Ctrl │ │
│ └──────────┘ └──────────┘ │
└──────────────────────────────────────────────────────┘
│ │ │
┌────────▼──┐ ┌─────▼───┐ ┌──▼──────────┐
│PostgreSQL │ │ Redis │ │ Vault │
│(org rows) │ │(webhook │ │ (secrets) │
└───────────┘ │ queue) │ └─────────────┘
└─────────┘
```
---
## Architectural Decision Records
---
### D1: Multi-Tenancy Model
**Status**: Accepted
**Decision**: Row-level tenancy — add `organization_id` (UUID, NOT NULL) to every domain table. No schema-per-tenant, no database-per-tenant.
**Rationale**: Row-level tenancy is operationally the simplest approach: a single database, a single schema, a single connection pool. All queries are augmented with an `organization_id` filter extracted from the authenticated JWT. PostgreSQL Row-Level Security (RLS) is enabled on all tenant-scoped tables as a defense-in-depth measure — even if the application filter is accidentally omitted, the database enforces isolation.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Schema-per-tenant | Strong isolation, independent migrations | Complex migration tooling, connection pool explosion at scale | Operational overhead exceeds threat model requirement |
| Database-per-tenant | Maximum isolation | Separate connection pool, backup, monitoring per tenant | Prohibitive at 100+ orgs; overkill for our threat model |
| Row-level (chosen) | Simple, fast, single migration path | RLS must be enforced consistently | Chosen — enforce via both application and RLS |
**Consequences**:
- Every domain table gets an `organization_id` column and a corresponding index
- All service methods accept `organizationId: string` as a required parameter
- JWT payload extended to include `organization_id` claim
- Existing single-tenant data migrated to a default `system` organization
- PostgreSQL RLS policies written for all tenant tables
---
### D2: DID Method Selection
**Status**: Accepted
**Decision**: `did:web` — DID Documents served over HTTPS at well-known and per-agent URLs.
**Rationale**: `did:web` requires no blockchain, no ledger, and no external infrastructure beyond the HTTPS server already running. It is W3C DID Core 1.0 compliant, supported by all major DID resolvers, and is the preferred method for enterprise deployments where an organization controls its own domain. It aligns directly with the `did:web` identifier scheme used in AGNTCY agent card specifications.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| `did:web` (chosen) | No blockchain, HTTPS-based, enterprise-friendly | DID tied to domain; moving the domain invalidates DIDs | Accepted tradeoff — enterprise deployments have stable domains |
| `did:key` | Self-contained, no infrastructure | Not anchored — anyone can generate any `did:key`; no discovery | No trust anchor; not suitable for enterprise identity |
| `did:ethr` | Ethereum-anchored, decentralized | Blockchain dependency, gas costs, not enterprise-standard | Blockchain dependency is a non-starter for regulated enterprises |
**Consequences**:
- DID for the AgentIdP instance: `did:web:<hostname>`
- DID for an agent: `did:web:<hostname>:agents:<agentId>`
- DID Documents served at `/.well-known/did.json` and `/agents/:id/did`
- Domain change requires DID migration — document this in ops runbook
---
### D3: OIDC Library Selection
**Status**: Accepted
**Decision**: `oidc-provider` npm package — a certified, RFC-compliant OIDC server library.
**Rationale**: `oidc-provider` is the most widely deployed Node.js OIDC library, passing the OpenID Foundation's official conformance test suite. Building OIDC from scratch on top of our existing JWT infrastructure would require implementing Discovery, JWKS rotation, ID token construction, and claim aggregation correctly against multiple RFCs. The certified library eliminates that risk and reduces implementation surface area. It integrates cleanly with Express as a mounted middleware.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| `oidc-provider` (chosen) | Certified, RFC-complete, actively maintained | Adds a significant dependency | Risk of non-compliance from custom implementation outweighs dependency cost |
| Custom JWT extension | Full control, no new dependency | High risk of spec deviation; ID token, Discovery, JWKS are complex | RFC compliance cannot be self-certified |
| `keycloak` sidecar | Battle-tested, full-featured | Heavyweight Java service; architectural mismatch | Not Node.js; adds operational complexity |
**Consequences**:
- `oidc-provider` is mounted at `/oidc` in Express
- OIDC Discovery served at `/.well-known/openid-configuration` (proxied from oidc-provider)
- JWKS served at `/.well-known/jwks.json`
- Adapter written to store OIDC sessions in Redis (oidc-provider's adapter interface)
- Existing `POST /oauth2/token` route extended, not replaced — maintains backward compatibility
---
### D4: Federation Protocol
**Status**: Accepted
**Decision**: Signed JWT assertions — remote AgentIdP instances present a signed JWT; the receiving instance verifies the signature against the registered JWKS of the issuing instance.
**Rationale**: JWT assertion federation reuses the existing JWT infrastructure (`jsonwebtoken`, JWKS endpoint from OIDC workstream). No new protocol is introduced. The trust model is explicit: operators register partner instances with their JWKS URL. This aligns with RFC 7523 (JWT Profile for OAuth 2.0 Client Authentication) and the AGNTCY inter-agent trust model.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Signed JWT assertions (chosen) | Uses existing JWT infra, explicit trust registry, RFC-aligned | JWKS URL must be reachable at verification time | Acceptable operational constraint; JWKS can be cached |
| mTLS | Strong cryptographic identity | Certificate management overhead, PKI required per partner | Cert management complexity not justified when JWT assertions suffice |
| AGNTCY-specific protocol | Native alignment | Spec still evolving; risk of churn | Build on stable JWT base; adapt to AGNTCY extensions as spec matures |
**Consequences**:
- New `federation_partners` table: `id`, `name`, `jwks_url`, `issuer`, `trusted_since`, `organization_id`
- JWKS of partner instances cached in Redis with TTL
- `POST /federation/verify` accepts a bearer token from a remote instance and returns verification result
- Federation tokens are not accepted for agent management endpoints — only for identity assertion
---
### D5: Webhook Delivery Architecture
**Status**: Accepted
**Decision**: Async delivery via Redis-backed `bull` queue with exponential backoff retry (max 10 attempts over 24 hours).
**Rationale**: Synchronous webhook delivery from within a request handler would add latency and create tight coupling between event generation and delivery outcome. The Redis queue (`bull`) decouples delivery: events are enqueued immediately, a background worker delivers them. `bull` provides built-in retry, delay, and failure tracking without introducing a new infrastructure component (Redis is already present). HMAC-SHA256 signing on every delivery allows recipients to verify authenticity.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Redis queue via `bull` (chosen) | Reuses existing Redis, retry built-in, low operational overhead | Delivery tied to Redis availability | Acceptable — Redis is already a required dependency |
| Synchronous in-request delivery | Simplest implementation | Adds latency to event-generating requests; failure blocks response | Unacceptable latency and coupling |
| Dedicated message broker (RabbitMQ) | Robust, durable | New infrastructure dependency | Operational overhead; Redis already present |
| Kafka (primary) | High-throughput, durable | Overkill for webhook delivery; complex operations | Optional adapter only; not primary delivery mechanism |
**Consequences**:
- New `webhook_subscriptions` and `webhook_deliveries` tables
- `bull` worker process runs in same Node.js instance (separate worker thread via `bull`)
- Retry schedule: 1m, 5m, 15m, 1h, 4h, 12h, 24h (exponential backoff)
- Failed delivery after 10 attempts moves to dead-letter; operator alerted
- Optional Kafka adapter: if `KAFKA_BROKERS` env var is set, events are also produced to Kafka
---
### D6: SOC 2 Scope
**Status**: Accepted
**Decision**: Target SOC 2 Type II (operational, not just design). All controls implemented in code. Audit period: 6 months post-Phase 3 launch.
**Rationale**: SOC 2 Type I certifies that controls are designed correctly. SOC 2 Type II certifies that they operate continuously over a period of time. Enterprise customers in regulated industries (finance, healthcare, government) require Type II. Implementing the controls now, with the 6-month operational window beginning at Phase 3 launch, puts us on the fastest possible path to Type II certification.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Type II from launch (chosen) | Satisfies enterprise requirements | Requires 6-month operation window | Accepted — the controls are implemented in Phase 3; audit window starts after launch |
| Type I only | Faster to certify | Not accepted by most enterprise procurement | Insufficient for target customers |
| ISO 27001 instead | International standard | Larger scope, longer implementation | SOC 2 is standard for US market; add ISO 27001 in Phase 4 |
**Consequences**:
- Encryption at rest: `pgcrypto` extension for column-level encryption on `credentials.secret_hash` and `credentials.vault_path`
- TLS enforcement: Express middleware rejects HTTP requests (not HTTPS) in production
- Secrets rotation: cron-based job that triggers credential rotation reminders and Vault lease renewals
- Security alerting: Prometheus alerting rules for auth failure spikes, rate limit exhaustion, anomalous token issuance
- Audit log immutability: Merkle hash chain (each row's hash includes the previous row's hash)
---
### D7: Audit Log Immutability — Merkle Hash Chain
**Status**: Accepted
**Decision**: Each `audit_logs` row carries a `hash` field: `SHA-256(eventId + timestamp + action + outcome + agentId + previousHash)`. The chain starts with a genesis hash. Verification is a sequential pass over all rows in insertion order.
**Rationale**: Append-only logs in PostgreSQL can be altered by a DBA with sufficient access. A Merkle-style hash chain makes tampering detectable without requiring a blockchain. Any modification to a historical row breaks the chain from that point forward. Verification is a simple sequential computation that can be run on demand or as a scheduled integrity check.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Merkle hash chain in PostgreSQL (chosen) | No new infra, tamper-evident, verifiable | DBA can re-compute hashes after tampering if they control the algorithm | Acceptable — threat model is accidental/low-sophistication modification; cryptographic chain deters opportunistic tampering |
| Blockchain anchor | Cryptographically immutable | Blockchain dependency, cost, latency | Excessive for current threat model |
| Write-once S3/GCS export | External immutability | Delayed; operational complexity | Added complexity; hash chain provides continuous coverage |
**Consequences**:
- New `hash` (VARCHAR 64) and `previous_hash` (VARCHAR 64) columns on `audit_logs`
- `AuditService.create()` computes hash before insert — adds ~1ms latency per audit event
- New `GET /audit/verify` endpoint: returns chain integrity status (admin only)
- `audit_logs` table has an `INSERT`-only trigger that prevents `UPDATE` and `DELETE` via PostgreSQL trigger
---
### D8: Organization Context in JWT
**Status**: Accepted
**Decision**: Add `organization_id` claim to JWT access tokens issued by `POST /oauth2/token`. All downstream middleware extracts `organization_id` from the token — no separate lookup required.
**Rationale**: Including `organization_id` in the JWT keeps the middleware stack stateless. The alternative — looking up the organization from the database on every request — adds latency and a database round-trip to every authenticated call. The JWT is already signed; adding a claim costs nothing cryptographically.
**Consequences**:
- `ITokenPayload` interface extended: `organization_id: string`
- All service methods receive `organizationId` from `req.user.organization_id`
- Token introspection response includes `organization_id`
- Agents registered before multi-tenancy belong to the default `system` organization
---
## Component Interaction Map (Phase 3)
```
┌──────────────────────┐
│ Web Dashboard │
│ (+ Org Mgmt pages) │
└──────────┬───────────┘
│ HTTPS
┌───────────────────────▼─────────────────────────────┐
│ AgentIdP Server │
│ │
│ TLS MW → Auth MW → OrgContext MW → OPA MW │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ OrgService│ │DIDService │ │ OIDCProvider │ │
│ └───────────┘ └───────────┘ │ (oidc-provider) │ │
│ ┌───────────┐ ┌───────────┐ └───────────────────┘ │
│ │ FedService│ │WebhookSvc │ │
│ └───────────┘ └───────────┘ │
│ ┌─────────────────────────┐ │
│ │ SOC2Controls (cross-cut)│ │
│ └─────────────────────────┘ │
└──────────┬──────────────┬──────────────┬─────────────┘
│ │ │
┌────────▼──┐ ┌───────▼──┐ ┌──────▼──────┐
│PostgreSQL │ │ Redis │ │ Vault │
│ + RLS │ │ +bull Q │ │ (secrets) │
└───────────┘ └──────────┘ └─────────────┘
┌────────▼──────┐
│ Prometheus │
│ + Alerting │
└────────┬──────┘
┌────────▼──────┐
│ Grafana │
└───────────────┘
```

View File

@@ -0,0 +1,165 @@
# Phase 3: Enterprise — Change Proposal
**Date**: 2026-03-29
**Author**: Virtual Architect
**Status**: Proposed — awaiting CEO approval
---
## Summary
Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 made it production-ready: Vault-backed secrets, multi-language SDKs, OPA policy engine, React dashboard, Prometheus/Grafana observability, and multi-region Terraform deployment. Phase 3 makes AgentIdP enterprise-grade: the platform moves from a single-tenant developer tool to a multi-tenant enterprise identity platform with W3C DID support, OIDC compliance, AGNTCY federation, real-time event streaming, and SOC 2 Type II controls.
---
## Problem Statement
Phase 1 and Phase 2 are functional and production-ready but have the following enterprise gaps:
| Gap | Risk |
|-----|------|
| Single-tenant architecture | Cannot serve enterprise customers with isolated data requirements |
| No W3C DID support | Not fully AGNTCY-compliant; agents lack interoperable decentralized identifiers |
| OAuth 2.0 only, no OIDC | Cannot integrate with standard enterprise identity ecosystems (SSO, SCIM) |
| No cross-instance federation | Multi-organization agent identity cannot be verified across AgentIdP deployments |
| No webhook/event streaming | Operators cannot react to agent lifecycle events in real time |
| No SOC 2 controls | Cannot pass enterprise security reviews; blocks revenue from regulated industries |
---
## Proposed Changes
### 1. Multi-Tenancy
Introduce an Organization model so a single AgentIdP instance can serve multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit log, and rate limits. A new Admin API provides organization lifecycle management. All existing agent, credential, and audit endpoints become organization-scoped.
### 2. W3C Decentralized Identifiers (DIDs)
Issue a W3C `did:web` identifier for every registered agent. Serve DID Documents at `/.well-known/did.json` (instance root) and `/agents/:id/did` (per-agent). Expose a DID resolution endpoint. Produce AGNTCY-format agent cards from DID Documents.
### 3. AGNTCY Federation
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted federation partners. Tokens issued by a trusted remote AgentIdP instance can be verified locally, enabling multi-organization and cross-enterprise agent identity interoperability aligned with AGNTCY standards.
### 4. OpenID Connect (OIDC)
Add a full OIDC layer on top of the existing OAuth 2.0 implementation using the `oidc-provider` certified library. Exposes OIDC Discovery, JWKS, ID tokens with agent claims, and an `/agent-info` endpoint (the agent-identity equivalent of the OIDC `/userinfo` endpoint).
### 5. Webhooks and Event Streaming
Real-time event notifications for all agent lifecycle events: agent created, suspended, revoked, credential rotated, token issued. Operators create webhook subscriptions with HMAC-SHA256 signing. Delivery is async via a Redis-backed queue with exponential backoff retry. An optional Kafka/NATS adapter is available for high-throughput environments.
### 6. SOC 2 Type II Preparation
Implement the technical controls required for SOC 2 Type II audit: encryption at rest via PostgreSQL column-level encryption for secrets, TLS enforcement on all inbound connections, automated secrets rotation, security event alerting via Prometheus alerting rules, and audit log immutability proof using a Merkle hash chain appended to each `audit_logs` row.
---
## Out of Scope for Phase 3
- Rust/C++ SDKs (Phase 4)
- Azure Terraform module (Phase 4)
- SCIM provisioning (Phase 4)
- End-user (human operator) identity management (out of product scope — AgentIdP is agent-first)
---
## Capabilities Table
### New Capabilities
| Workstream | Capability | Type |
|-----------|-----------|------|
| Multi-Tenancy | Organization model with isolated agent namespaces | New |
| Multi-Tenancy | Admin API: create, list, update, delete organizations | New |
| Multi-Tenancy | Per-organization rate limits and audit logs | New |
| Multi-Tenancy | Organization member management | New |
| W3C DIDs | `did:web` identifier on every registered agent | New |
| W3C DIDs | DID Document endpoint per agent | New |
| W3C DIDs | Instance-level root DID Document | New |
| W3C DIDs | DID resolution endpoint | New |
| W3C DIDs | AGNTCY-format agent card from DID Document | New |
| OIDC | OIDC Discovery endpoint (`/.well-known/openid-configuration`) | New |
| OIDC | JWKS endpoint (`/.well-known/jwks.json`) | New |
| OIDC | ID token with agent claims in token response | Modified |
| OIDC | `/agent-info` endpoint (agent claims) | New |
| Federation | Trust registry: register and list federation partners | New |
| Federation | Cross-instance token verification endpoint | New |
| Federation | Signed JWT assertion inter-IdP protocol | New |
| Webhooks | Webhook subscription management (CRUD) | New |
| Webhooks | HMAC-SHA256 signed delivery with retry | New |
| Webhooks | Delivery history log | New |
| Webhooks | Kafka/NATS adapter (optional) | New |
| SOC 2 | PostgreSQL column-level encryption for secrets at rest | New |
| SOC 2 | TLS enforcement middleware (reject non-TLS) | New |
| SOC 2 | Automated secrets rotation schedule | New |
| SOC 2 | Security event alerting (Prometheus alerting rules) | New |
| SOC 2 | Merkle hash chain on `audit_logs` for immutability proof | New |
| SOC 2 | Compliance documentation (controls matrix, runbook) | New |
### Modified Capabilities
| Workstream | Capability | Change |
|-----------|-----------|--------|
| Multi-Tenancy | `POST /agents` | Now scoped to `organizationId` |
| Multi-Tenancy | `GET /agents` | Filters restricted to caller's organization |
| Multi-Tenancy | `GET /audit` | Restricted to caller's organization by default |
| Multi-Tenancy | Rate limiting | Per-organization limits in addition to global |
| OIDC | `POST /oauth2/token` | Returns `id_token` in addition to `access_token` |
| SOC 2 | Audit log write path | Computes and appends Merkle hash on insert |
---
## Repository Impact
| Area | Impact |
|------|--------|
| `src/` | New services: OrgService, DIDService, OIDCService, FederationService, WebhookService, SOC2Controls |
| `src/db/migrations/` | 810 new migration files |
| `src/types/index.ts` | ~80 new interfaces/types |
| `src/middleware/` | New TLS enforcement middleware, updated auth middleware for org context |
| `src/routes/` | 6 new route files |
| `/.well-known/` | 3 new well-known endpoints |
| `policies/` | Updated Rego policies for org-scoped permissions |
| `dashboard/` | New Organization management pages |
| `monitoring/` | New alerting rules for SOC 2 security events |
| `docs/` | Compliance documentation, federation setup guide, webhook integration guide |
---
## New Dependencies
| Workstream | Package | Purpose | CEO Approval Required |
|-----------|---------|---------|----------------------|
| Multi-Tenancy | No new packages — row-level tenancy in existing PostgreSQL | — | No |
| W3C DIDs | `did-resolver` | W3C DID resolution | Yes |
| W3C DIDs | `web-did-resolver` | DID:WEB method resolver | Yes |
| OIDC | `oidc-provider` | Certified OIDC server library | Yes |
| Federation | No new packages — signed JWT assertions use existing `jsonwebtoken` | — | No |
| Webhooks | `bull` (Redis-backed queue) | Async webhook delivery queue | Yes |
| Webhooks | `kafkajs` (optional, Kafka adapter) | Kafka event streaming | Yes |
| SOC 2 | `node-forge` | Column-level encryption primitives | Yes |
---
## Delivery Sequence
Multi-tenancy is a prerequisite for all enterprise customer work — it must land first. DID support and OIDC are independent and can proceed in parallel. Federation depends on DIDs being in place. Webhooks are standalone. SOC 2 controls cut across the entire codebase and are implemented last to ensure all features they protect are already present.
```
1. Multi-Tenancy (prerequisite — all enterprise features assume org context)
2. W3C DIDs (parallel)
OIDC (parallel)
3. Federation (depends on DIDs)
4. Webhooks (standalone)
5. SOC 2 (cuts across all workstreams — implemented after all features are stable)
```
---
## Success Criteria
- All new dependencies CEO-approved before implementation begins
- All new API endpoints have OpenAPI 3.0 specs before implementation
- Multi-tenancy isolation verified: no cross-organization data leakage
- DID Documents are W3C DID Core 1.0 compliant and resolve correctly
- OIDC Discovery passes `oidc-provider` conformance test suite
- Federation token verification rejects tampered assertions
- Webhook delivery achieves >99.9% success rate with retry logic
- SOC 2 controls pass independent technical review
- TypeScript strict mode + zero `any` maintained throughout
- >80% test coverage on all new services

View File

@@ -0,0 +1,370 @@
# AGNTCY Federation — Specification
**Workstream**: 4 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
---
## API Endpoints
### POST /federation/trust
Register a new federation trust partner. Requires `admin:orgs` scope.
```yaml
POST /federation/trust
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, issuer, jwksUri]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Human-readable name for this federation partner
example: "Contoso AgentIdP"
issuer:
type: string
format: uri
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
example: "https://agentidp.contoso.com"
jwksUri:
type: string
format: uri
description: URL of the partner's JWKS endpoint
example: "https://agentidp.contoso.com/.well-known/jwks.json"
allowedOrganizations:
type: array
items:
type: string
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
example: ["org_contoso_engineering"]
expiresAt:
type: string
format: date-time
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
Responses:
201 Created:
schema:
$ref: '#/components/schemas/FederationPartner'
example:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
allowedOrganizations: []
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
duplicate_issuer:
code: "DUPLICATE_ISSUER"
message: "A trust relationship with this issuer already exists"
unreachable_jwks:
code: "JWKS_UNREACHABLE"
message: "Could not fetch JWKS from the provided jwksUri"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /federation/partners
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
```yaml
GET /federation/partners
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, expired]
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/FederationPartner'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /federation/partners/:partnerId
Remove a federation trust relationship. Requires `admin:orgs` scope.
```yaml
DELETE /federation/partners/{partnerId}
Authorization: Bearer <token with admin:orgs scope>
Path Parameters:
partnerId:
type: string
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /federation/verify
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
```yaml
POST /federation/verify
Authorization: Bearer <local access_token with agents:read scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [token]
properties:
token:
type: string
description: The JWT token issued by the remote AgentIdP instance to verify
expectedIssuer:
type: string
format: uri
description: Optional — if provided, verification fails if token issuer does not match
expectedOrganizationId:
type: string
description: Optional — if provided, verification fails if token organization_id does not match
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
claims:
type: object
description: Decoded JWT claims from the verified token
properties:
sub:
type: string
iss:
type: string
iat:
type: integer
exp:
type: integer
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
did:
type: string
partner:
type: object
description: The federation partner record that vouches for this token
properties:
partnerId:
type: string
name:
type: string
issuer:
type: string
example:
valid: true
claims:
sub: "agt_contoso_abc123"
iss: "https://agentidp.contoso.com"
iat: 1743249600
exp: 1743253200
agent_id: "agt_contoso_abc123"
agent_type: "classifier"
organization_id: "org_contoso_engineering"
capabilities: ["text-classification"]
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
partner:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized (local token invalid):
schema:
$ref: '#/components/schemas/ErrorResponse'
422 Unprocessable Entity (token invalid or untrusted issuer):
schema:
type: object
properties:
valid:
type: boolean
example: false
reason:
type: string
enum:
- TOKEN_EXPIRED
- INVALID_SIGNATURE
- UNTRUSTED_ISSUER
- JWKS_FETCH_FAILED
- ORGANIZATION_NOT_ALLOWED
message:
type: string
example:
valid: false
reason: "UNTRUSTED_ISSUER"
message: "No trust relationship registered for issuer https://unknown.example.com"
```
---
## Database Schema Changes
### New Table: federation_partners
```sql
CREATE TABLE federation_partners (
partner_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
name VARCHAR(100) NOT NULL,
issuer VARCHAR(255) NOT NULL,
jwks_uri VARCHAR(255) NOT NULL,
allowed_organizations JSONB NOT NULL DEFAULT '[]',
status VARCHAR(20) NOT NULL DEFAULT 'active',
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
last_jwks_fetch TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
UNIQUE (organization_id, issuer)
);
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
```
### Redis: JWKS Cache
Partner JWKS documents are cached in Redis with a TTL:
```
Key: federation:jwks:<issuer_url_sha256>
Value: JSON string of the JWKS document
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
---
## Dependencies
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
---
## Security Considerations
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
---
## Acceptance Criteria
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService

View File

@@ -0,0 +1,444 @@
# Multi-Tenancy — Specification
**Workstream**: 1 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
---
## API Endpoints
### POST /organizations
Create a new organization. Requires system-admin scope (`admin:orgs`).
```yaml
POST /organizations
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, slug]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Display name of the organization
example: "Acme AI Platform"
slug:
type: string
minLength: 2
maxLength: 50
pattern: "^[a-z0-9-]+$"
description: URL-safe unique identifier
example: "acme-ai"
planTier:
type: string
enum: [free, pro, enterprise]
default: free
maxAgents:
type: integer
minimum: 1
default: 100
maxTokensPerMonth:
type: integer
minimum: 1
default: 10000
Responses:
201 Created:
schema:
$ref: '#/components/schemas/Organization'
example:
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
maxAgents: 100
maxTokensPerMonth: 10000
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "VALIDATION_ERROR"
message: "slug must be unique"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "admin:orgs scope required"
```
---
### GET /organizations
List all organizations. Requires `admin:orgs` scope.
```yaml
GET /organizations
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, deleted]
page:
type: integer
minimum: 1
default: 1
limit:
type: integer
minimum: 1
maximum: 100
default: 20
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Organization'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /organizations/:orgId
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
```yaml
GET /organizations/{orgId}
Authorization: Bearer <token>
Path Parameters:
orgId:
type: string
description: Organization ID (org_... prefix)
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_NOT_FOUND"
message: "Organization not found"
```
---
### PATCH /organizations/:orgId
Partially update an organization. Requires `admin:orgs` scope.
```yaml
PATCH /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
name:
type: string
minLength: 2
maxLength: 100
planTier:
type: string
enum: [free, pro, enterprise]
maxAgents:
type: integer
minimum: 1
maxTokensPerMonth:
type: integer
minimum: 1
status:
type: string
enum: [active, suspended]
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /organizations/:orgId
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
```yaml
DELETE /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_HAS_ACTIVE_AGENTS"
message: "Organization has active agents; decommission all agents before deleting"
```
---
### POST /organizations/:orgId/members
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
```yaml
POST /organizations/{orgId}/members
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [agentId, role]
properties:
agentId:
type: string
description: ID of an already-registered agent to add as a member
role:
type: string
enum: [member, admin]
description: Role within the organization
Responses:
201 Created:
schema:
$ref: '#/components/schemas/OrgMember'
example:
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
role: "member"
joinedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ALREADY_MEMBER"
message: "Agent is already a member of this organization"
```
---
### Modified: All /agents, /audit endpoints
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
---
## Database Schema Changes
### New Table: organizations
```sql
CREATE TABLE organizations (
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
name VARCHAR(100) NOT NULL,
slug VARCHAR(50) NOT NULL UNIQUE,
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
max_agents INTEGER NOT NULL DEFAULT 100,
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
status VARCHAR(20) NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
);
CREATE INDEX idx_organizations_slug ON organizations(slug);
CREATE INDEX idx_organizations_status ON organizations(status);
```
### New Table: organization_members
```sql
CREATE TABLE organization_members (
member_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
role VARCHAR(20) NOT NULL DEFAULT 'member',
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
UNIQUE (organization_id, agent_id)
);
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
```
### Modified: agents table
```sql
ALTER TABLE agents
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
-- RLS
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
CREATE POLICY agents_org_isolation ON agents
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: credentials table
```sql
ALTER TABLE credentials
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY credentials_org_isolation ON credentials
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_logs_org_isolation ON audit_logs
USING (organization_id = current_setting('app.organization_id', true));
```
### Seed: Default system organization
```sql
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
---
## Dependencies
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
---
## Security Considerations
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
- `SET LOCAL app.organization_id` is called at the start of every database transaction
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
---
## Acceptance Criteria
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
- [ ] Pre-existing agents assigned to default system organization without data loss
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService

View File

@@ -0,0 +1,366 @@
# OpenID Connect (OIDC) — Specification
**Workstream**: 3 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
---
## API Endpoints
### GET /.well-known/openid-configuration
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
```yaml
GET /.well-known/openid-configuration
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: OIDC Discovery document per OpenID Connect Discovery 1.0
example:
issuer: "https://idp.sentryagent.ai"
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
response_types_supported:
- "token"
grant_types_supported:
- "client_credentials"
subject_types_supported:
- "public"
id_token_signing_alg_values_supported:
- "RS256"
- "ES256"
scopes_supported:
- "openid"
- "agents:read"
- "agents:write"
- "tokens:read"
- "audit:read"
claims_supported:
- "sub"
- "iss"
- "iat"
- "exp"
- "agent_id"
- "agent_type"
- "organization_id"
- "capabilities"
- "deployment_env"
- "owner"
token_endpoint_auth_methods_supported:
- "client_secret_post"
- "client_secret_basic"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /.well-known/jwks.json
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
```yaml
GET /.well-known/jwks.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
Cache-Control: public, max-age=3600
schema:
type: object
required: [keys]
properties:
keys:
type: array
items:
type: object
description: JSON Web Key (RFC 7517)
properties:
kty:
type: string
example: "RSA"
use:
type: string
example: "sig"
kid:
type: string
description: Key ID — matches `kid` header in issued JWTs
alg:
type: string
example: "RS256"
n:
type: string
description: RSA modulus (base64url)
e:
type: string
description: RSA exponent (base64url)
example:
keys:
- kty: "RSA"
use: "sig"
kid: "key-2026-03-29-01"
alg: "RS256"
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
e: "AQAB"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /oauth2/token (extended)
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
```yaml
POST /oauth2/token
Content-Type: application/x-www-form-urlencoded
Request Body:
schema:
type: object
required: [grant_type, client_id, client_secret]
properties:
grant_type:
type: string
enum: [client_credentials]
client_id:
type: string
client_secret:
type: string
scope:
type: string
description: Space-separated scopes. Include "openid" to receive an id_token.
example: "openid agents:read"
Responses:
200 OK (with openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
example: "Bearer"
expires_in:
type: integer
scope:
type: string
id_token:
type: string
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "openid agents:read"
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
200 OK (without openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
expires_in:
type: integer
scope:
type: string
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "agents:read"
400 Bad Request:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
example:
error: "invalid_client"
error_description: "Invalid client credentials"
401 Unauthorized:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
```
#### ID Token Claims
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
```json
{
"iss": "https://idp.sentryagent.ai",
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
"iat": 1743249600,
"exp": 1743253200,
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
"agent_type": "orchestrator",
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
"capabilities": ["task-planning", "tool-use"],
"deployment_env": "production",
"owner": "acme-ai",
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
}
```
---
### GET /agent-info
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
```yaml
GET /agent-info
Authorization: Bearer <access_token>
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: Agent identity claims (subset of registered agent data)
properties:
sub:
type: string
description: Subject — agentId
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
deployment_env:
type: string
owner:
type: string
version:
type: string
status:
type: string
did:
type: string
description: W3C DID for this agent (if DID workstream is active)
created_at:
type: string
format: date-time
example:
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_type: "orchestrator"
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
capabilities: ["task-planning", "tool-use"]
deployment_env: "production"
owner: "acme-ai"
version: "1.2.0"
status: "active"
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
created_at: "2026-03-29T12:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "Invalid or expired access token"
```
---
## Database Schema Changes
### New Table: oidc_keys
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
```sql
CREATE TABLE oidc_keys (
key_id VARCHAR(40) PRIMARY KEY,
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
algorithm VARCHAR(10) NOT NULL,
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
retired_at TIMESTAMPTZ,
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
);
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
---
## Security Considerations
- ID token signing keys are stored in Vault; public keys only are served via JWKS
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
- ID tokens for agent credentials must not contain client secrets or internal system paths
- `alg: none` is explicitly rejected — all ID tokens must be signed
---
## Acceptance Criteria
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
- [ ] `/agent-info` returns correct claims for authenticated agent
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService

View File

@@ -0,0 +1,335 @@
# SOC 2 Type II Preparation — Specification
**Workstream**: 6 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
---
## Technical Controls
### Control C1: Encryption at Rest (Column-Level Encryption)
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
**Columns encrypted**:
- `credentials.secret_hash` — encrypted with AES-256-CBC
- `credentials.vault_path` — encrypted with AES-256-CBC
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
---
### Control C2: TLS Enforcement
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
---
### Control C3: Automated Secrets Rotation
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
3. Renews Vault leases for all active credentials
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
---
### Control C4: Audit Log Immutability (Merkle Hash Chain)
Every `audit_logs` row carries two new columns:
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
---
### Control C5: Security Event Alerting
Prometheus alerting rules are written for the following security events:
| Alert | Condition | Severity |
|-------|-----------|---------|
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
---
## API Endpoints
### GET /audit/verify
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
```yaml
GET /audit/verify
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
fromDate:
type: string
format: date-time
description: Start of verification range. If omitted, verifies from genesis.
toDate:
type: string
format: date-time
description: End of verification range. If omitted, verifies to the latest row.
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
description: True if the chain is intact across the entire range
rowsVerified:
type: integer
description: Number of audit rows verified
firstEventId:
type: string
lastEventId:
type: string
firstTimestamp:
type: string
format: date-time
lastTimestamp:
type: string
format: date-time
verifiedAt:
type: string
format: date-time
brokenAtEventId:
type: string
nullable: true
description: Present only if valid=false — the first eventId where the chain breaks
example:
valid: true
rowsVerified: 15420
firstEventId: "evt_genesis_00001"
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
firstTimestamp: "2026-01-01T00:00:00Z"
lastTimestamp: "2026-03-29T12:00:00Z"
verifiedAt: "2026-03-29T14:00:00Z"
brokenAtEventId: null
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
429 Too Many Requests:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMITED"
message: "Audit verification can be run at most once per 5 minutes"
```
---
### GET /compliance/controls
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
```yaml
GET /compliance/controls
Authorization: Bearer <token with admin:orgs scope>
Responses:
200 OK:
schema:
type: object
properties:
generatedAt:
type: string
format: date-time
controls:
type: array
items:
type: object
properties:
controlId:
type: string
name:
type: string
status:
type: string
enum: [pass, fail, warning, not_applicable]
description:
type: string
lastChecked:
type: string
format: date-time
example:
generatedAt: "2026-03-29T14:00:00Z"
controls:
- controlId: "C1"
name: "Encryption at Rest"
status: "pass"
description: "Column-level encryption active for all sensitive columns"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C2"
name: "TLS Enforcement"
status: "pass"
description: "All non-TLS requests redirected to HTTPS in production"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C3"
name: "Secrets Rotation"
status: "warning"
description: "3 credentials expiring within 7 days"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C4"
name: "Audit Log Immutability"
status: "pass"
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C5"
name: "Security Event Alerting"
status: "pass"
description: "All 6 alerting rules active in Prometheus"
lastChecked: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
-- Back-fill genesis hash for existing rows (one-time migration)
-- Migration script computes chain in order of created_at
-- Prevent updates and deletes (immutability trigger)
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_logs_immutability
BEFORE UPDATE OR DELETE ON audit_logs
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
```
### Modified: credentials table
```sql
-- Columns remain same type; application now stores encrypted values
-- No DDL change — encryption is transparent at application layer
-- Add comment for documentation
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
```
### New Table: compliance_check_log
```sql
CREATE TABLE compliance_check_log (
check_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
control_id VARCHAR(10) NOT NULL,
status VARCHAR(20) NOT NULL,
details JSONB NOT NULL DEFAULT '{}',
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
---
## Compliance Documentation
The following documents are produced as part of this workstream:
| Document | Path | Description |
|----------|------|-------------|
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
---
## Security Considerations
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
- `GET /compliance/controls` never returns raw secrets or key material — only control status
---
## Acceptance Criteria
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
- [ ] Compliance documentation written and reviewed
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations

View File

@@ -0,0 +1,353 @@
# W3C Decentralized Identifiers (DIDs) — Specification
**Workstream**: 2 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
---
## API Endpoints
### GET /.well-known/did.json
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
```yaml
GET /.well-known/did.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant DID Document
required: [id, "@context", verificationMethod, authentication]
properties:
"@context":
type: array
items:
type: string
example:
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/security/suites/jws-2020/v1"
id:
type: string
description: DID for this AgentIdP instance
example: "did:web:idp.sentryagent.ai"
controller:
type: string
example: "did:web:idp.sentryagent.ai"
verificationMethod:
type: array
items:
$ref: '#/components/schemas/VerificationMethod'
authentication:
type: array
items:
type: string
description: References to verification methods for authentication
assertionMethod:
type: array
items:
type: string
service:
type: array
items:
$ref: '#/components/schemas/DIDService'
example:
"@context":
- "https://www.w3.org/ns/did/v1"
id: "did:web:idp.sentryagent.ai"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
authentication:
- "did:web:idp.sentryagent.ai#key-1"
service:
- id: "did:web:idp.sentryagent.ai#agent-registry"
type: "AgentIdentityProvider"
serviceEndpoint: "https://idp.sentryagent.ai"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did
Per-agent DID Document. No authentication required — DID Documents are public.
```yaml
GET /agents/{agentId}/did
No authentication required
Path Parameters:
agentId:
type: string
description: Agent ID
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant per-agent DID Document
example:
"@context":
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/agntcy/v1"
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "abc123"
y: "def456"
authentication:
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
service:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
type: "AgentCard"
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
agntcy:
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
agentType: "orchestrator"
capabilities:
- "task-planning"
- "tool-use"
deploymentEnv: "production"
owner: "acme-ai"
version: "1.2.0"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_NOT_FOUND"
message: "Agent not found"
410 Gone:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_DECOMMISSIONED"
message: "Agent has been decommissioned — DID Document is no longer active"
```
---
### GET /agents/:id/did/resolve
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
```yaml
GET /agents/{agentId}/did/resolve
Authorization: Bearer <token with agents:read scope>
Path Parameters:
agentId:
type: string
Responses:
200 OK:
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
schema:
type: object
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
properties:
didDocument:
type: object
description: The resolved DID Document
didDocumentMetadata:
type: object
properties:
created:
type: string
format: date-time
updated:
type: string
format: date-time
deactivated:
type: boolean
didResolutionMetadata:
type: object
properties:
contentType:
type: string
example: "application/did+ld+json"
retrieved:
type: string
format: date-time
example:
didDocument:
"@context": ["https://www.w3.org/ns/did/v1"]
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
didDocumentMetadata:
created: "2026-03-29T12:00:00Z"
updated: "2026-03-29T12:00:00Z"
deactivated: false
didResolutionMetadata:
contentType: "application/did+ld+json"
retrieved: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did/card
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
```yaml
GET /agents/{agentId}/did/card
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: AGNTCY-format agent card
properties:
did:
type: string
name:
type: string
agentType:
type: string
capabilities:
type: array
items:
type: string
owner:
type: string
version:
type: string
deploymentEnv:
type: string
identityProvider:
type: string
description: DID of the issuing AgentIdP instance
issuedAt:
type: string
format: date-time
example:
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
name: "acme-orchestrator"
agentType: "orchestrator"
capabilities: ["task-planning", "tool-use"]
owner: "acme-ai"
version: "1.2.0"
deploymentEnv: "production"
identityProvider: "did:web:idp.sentryagent.ai"
issuedAt: "2026-03-29T12:00:00Z"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### New Table: agent_did_keys
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
```sql
CREATE TABLE agent_did_keys (
key_id VARCHAR(40) PRIMARY KEY,
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
rotated_at TIMESTAMPTZ,
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
);
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
```
### New Column: agents.did
```sql
ALTER TABLE agents
ADD COLUMN did VARCHAR(255),
ADD COLUMN did_created_at TIMESTAMPTZ;
-- Populated automatically on agent creation
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
---
## Security Considerations
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
---
## Acceptance Criteria
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
- [ ] DID resolution endpoint returns W3C DID Resolution format
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
- [ ] Private keys never appear in any API response or log
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService

View File

@@ -0,0 +1,476 @@
# Webhooks and Event Streaming — Specification
**Workstream**: 5 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
---
## API Endpoints
### POST /webhooks
Create a new webhook subscription. Requires `agents:write` scope.
```yaml
POST /webhooks
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [url, events, secret]
properties:
url:
type: string
format: uri
description: HTTPS endpoint to deliver events to
example: "https://app.example.com/hooks/agentidp"
events:
type: array
items:
type: string
enum:
- agent.created
- agent.updated
- agent.suspended
- agent.reactivated
- agent.decommissioned
- credential.generated
- credential.rotated
- credential.revoked
- token.issued
- token.revoked
- "*"
minItems: 1
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
example: ["agent.created", "credential.rotated"]
secret:
type: string
minLength: 16
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
example: "whsec_super_secret_value_here"
description:
type: string
maxLength: 255
description: Optional human-readable description for this subscription
active:
type: boolean
default: true
Responses:
201 Created:
schema:
$ref: '#/components/schemas/WebhookSubscription'
example:
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
url: "https://app.example.com/hooks/agentidp"
events: ["agent.created", "credential.rotated"]
description: "Production event sink"
active: true
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalid_url:
code: "VALIDATION_ERROR"
message: "url must be a valid HTTPS URI"
invalid_event:
code: "VALIDATION_ERROR"
message: "Unknown event type: agent.unknown"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
```yaml
GET /webhooks
Authorization: Bearer <token with agents:read scope>
Query Parameters:
active:
type: boolean
description: Filter by active/inactive subscriptions
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookSubscription'
total:
type: integer
page:
type: integer
limit:
type: integer
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id
Get a single webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:read scope>
Path Parameters:
subscriptionId:
type: string
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "WEBHOOK_NOT_FOUND"
message: "Webhook subscription not found"
```
---
### PATCH /webhooks/:id
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
```yaml
PATCH /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
url:
type: string
format: uri
events:
type: array
items:
type: string
description:
type: string
maxLength: 255
active:
type: boolean
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /webhooks/:id
Delete a webhook subscription. Requires `agents:write` scope.
```yaml
DELETE /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id/deliveries
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}/deliveries
Authorization: Bearer <token with agents:read scope>
Query Parameters:
status:
type: string
enum: [pending, success, failed, dead_letter]
eventType:
type: string
description: Filter by event type
fromDate:
type: string
format: date-time
toDate:
type: string
format: date-time
page:
type: integer
default: 1
limit:
type: integer
default: 50
maximum: 200
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookDelivery'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
eventType: "agent.created"
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
status: "success"
httpStatusCode: 200
attemptCount: 1
nextRetryAt: null
deliveredAt: "2026-03-29T12:00:05Z"
createdAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 50
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Webhook Payload Format
Every webhook delivery uses this envelope format:
```json
{
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
"type": "agent.created",
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
"timestamp": "2026-03-29T12:00:00Z",
"data": {
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
"agentType": "orchestrator",
"status": "active",
"owner": "acme-ai",
"version": "1.0.0",
"deploymentEnv": "production"
}
}
```
### HMAC-SHA256 Signature
Every delivery includes the following HTTP headers:
```
X-AgentIdP-Event: agent.created
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
X-AgentIdP-Timestamp: 1743249600
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
```
Signature computation:
```
signed_content = timestamp + "." + JSON.stringify(payload)
signature = HMAC-SHA256(secret, signed_content)
header_value = "sha256=" + hex(signature)
```
---
## Database Schema Changes
### New Table: webhook_subscriptions
```sql
CREATE TABLE webhook_subscriptions (
subscription_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
url VARCHAR(2048) NOT NULL,
events JSONB NOT NULL DEFAULT '[]',
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
vault_secret_path VARCHAR(255) NOT NULL,
description VARCHAR(255),
active BOOLEAN NOT NULL DEFAULT TRUE,
failure_count INTEGER NOT NULL DEFAULT 0,
last_delivery_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
```
### New Table: webhook_deliveries
```sql
CREATE TABLE webhook_deliveries (
delivery_id VARCHAR(40) PRIMARY KEY,
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
event_id VARCHAR(40) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
http_status_code SMALLINT,
response_body TEXT,
attempt_count SMALLINT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
);
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
```
---
## Retry Schedule
```
Attempt 1: immediate
Attempt 2: 1 minute after failure
Attempt 3: 5 minutes after failure
Attempt 4: 15 minutes after failure
Attempt 5: 1 hour after failure
Attempt 6: 4 hours after failure
Attempt 7: 12 hours after failure
Attempt 8: 24 hours after failure
Attempt 9: 48 hours after failure
Attempt 10: 72 hours after failure
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
---
## Security Considerations
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
- HMAC signature allows the receiving server to verify the delivery is authentic
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
---
## Acceptance Criteria
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
- [ ] After max retries, status is `dead_letter` and metric is incremented
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
- [ ] Delivery to private IP range is rejected (SSRF protection)
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService

View File

@@ -0,0 +1,142 @@
# Phase 3: Enterprise — Tasks
**Status**: COMPLETE — All 6 workstreams done ✅
## CEO Approval Gates (required before implementation)
- [x] A0.1 Approve dependency: `did-resolver` + `web-did-resolver` (W3C DID support)
- [x] A0.2 Approve dependency: `oidc-provider` (certified OIDC server library)
- [x] A0.3 Approve dependency: `bull` (Redis-backed webhook delivery queue)
- [x] A0.4 Approve dependency: `kafkajs` (optional Kafka adapter for webhooks)
- [x] A0.5 Approve dependency: `node-forge` (column-level encryption for SOC 2)
---
## Workstream 1: Multi-Tenancy
- [x] 1.1 Write `src/db/migrations/006_create_organizations_table.sql` — organizations table with slug, plan_tier, max_agents, max_tokens_per_month, status
- [x] 1.2 Write `src/db/migrations/007_create_organization_members_table.sql` — organization_members with agent_id FK and role
- [x] 1.3 Write `src/db/migrations/008_add_organization_id_to_agents.sql` — add organization_id column + index + RLS policy on agents
- [x] 1.4 Write `src/db/migrations/009_add_organization_id_to_credentials.sql` — add organization_id column + index + RLS policy on credentials
- [x] 1.5 Write `src/db/migrations/010_add_organization_id_to_audit_logs.sql` — add organization_id column + index + RLS policy on audit_logs
- [x] 1.6 Write `src/db/migrations/011_seed_system_organization.sql` — insert default system org and backfill existing rows
- [x] 1.7 Write `src/types/organization.ts` — IOrganization, ICreateOrgRequest, IUpdateOrgRequest, IOrgMember, IPaginatedOrgsResponse, OrgStatus, PlanTier interfaces
- [x] 1.8 Write `src/services/OrgService.ts` — createOrg, listOrgs, getOrg, updateOrg, deleteOrg, addMember; all methods accept organizationId context
- [x] 1.9 Write `src/controllers/OrgController.ts` — request parsing and validation for all 6 org endpoints
- [x] 1.10 Write `src/routes/organizations.ts` — mount all 6 org endpoints with admin:orgs scope guard
- [x] 1.11 Write `src/middleware/orgContext.ts` — OrgContextMiddleware: extracts organization_id from JWT and calls SET app.organization_id before each DB query
- [x] 1.12 Update `src/middleware/auth.ts` — extend ITokenPayload with organization_id claim; backfill from DEFAULT_ORG_ID for backward compat
- [x] 1.13 Update `src/services/AgentService.ts` — organizationId propagated via RLS session variable (orgContext middleware)
- [x] 1.14 Update `src/services/CredentialService.ts` — organizationId propagated via RLS session variable
- [x] 1.15 Update `src/services/AuditService.ts` — organizationId propagated via RLS session variable
- [x] 1.16 Update `src/services/OAuth2Service.ts` — include organization_id claim in issued JWT payload
- [x] 1.17 Update `src/types/index.ts` — extend ITokenPayload with organization_id field, admin:orgs scope, org audit actions
- [x] 1.18 Update OPA policy `policies/authz.rego` + `policies/data/scopes.json` — 6 new org endpoint → admin:orgs mappings
- [x] 1.19 Write unit tests for OrgService (CRUD, member management, org isolation)
- [x] 1.20 Write integration tests — all 6 /organizations endpoints, cross-org isolation via RLS
- [x] 1.21 QA sign-off: 373 tests passing, 80.64% branch coverage, zero `any`, TypeScript clean
---
## Workstream 2: W3C DIDs
- [x] 2.1 Write `src/db/migrations/012_create_agent_did_keys_table.sql` — agent_did_keys table with public_key_jwk JSONB and vault_key_path
- [x] 2.2 Write `src/db/migrations/013_add_did_columns_to_agents.sql` — add did and did_created_at columns to agents
- [x] 2.3 Write `src/types/did.ts` — IDIDDocument, IVerificationMethod, IDIDService, IDIDResolutionResult, IAgentCard interfaces
- [x] 2.4 Write `src/services/DIDService.ts` — generateDIDForAgent (EC P-256 key pair, Vault storage, public key in DB), buildInstanceDIDDocument, buildAgentDIDDocument, buildAgentCard, buildResolutionResult
- [x] 2.5 Update `src/services/AgentService.ts` — call DIDService.generateDIDForAgent on every new agent registration
- [x] 2.6 Write `src/controllers/DIDController.ts` — handlers for root DID Document, per-agent DID Document (410 for decommissioned), resolution endpoint, agent card
- [x] 2.7 Write `src/routes/did.ts` — createDIDRouter for `/agents/:id/did`, `/did/resolve`, `/did/card`; `/.well-known/did.json` registered in app.ts
- [x] 2.8 Implement Redis caching in DIDService — cache DID Documents with TTL from DID_DOCUMENT_CACHE_TTL_SECONDS (default 300s)
- [x] 2.9 Handle decommissioned agents — deactivated: true in metadata; HTTP 410 Gone from DIDController
- [x] 2.10 Write unit tests for DIDService — 39 tests, 98.93% coverage; private key security asserted
- [x] 2.11 Write integration tests — all 4 DID endpoints; 22 tests
- [x] 2.12 QA sign-off: 429 tests passing, 98.93% DIDService coverage, private key never in response, zero `any`
---
## Workstream 3: OpenID Connect (OIDC)
- [x] 3.1 Write `src/db/migrations/014_create_oidc_keys_table.sql` — oidc_keys table with kid, public_key_jwk, vault_key_path, is_current
- [x] 3.2 Write `src/services/OIDCKeyService.ts` — generateSigningKeyPair (RSA-2048 or EC P-256), storeKeyInVault, getPublicJWKS, getCurrentKeyId, rotateKey
- [x] 3.3 Write `src/services/IDTokenService.ts` — buildIDTokenClaims (agent claims), signIDToken using current Vault-stored key, verifyIDToken
- [x] 3.4 Write `src/types/oidc.ts` — IIDTokenClaims, IJWKSResponse, IOIDCDiscoveryDocument, IAgentInfoResponse interfaces
- [x] 3.5 Write `src/controllers/OIDCController.ts` — handlers for discovery, JWKS, agent-info
- [x] 3.6 Write `src/routes/oidc.ts` — mount `/.well-known/openid-configuration`, `/.well-known/jwks.json`, `/agent-info`
- [x] 3.7 Update `src/services/OAuth2Service.ts` — when `openid` scope is present in request, generate and append `id_token` to token response
- [x] 3.8 Implement JWKS caching — cache JWKS in Redis with TTL; invalidate on key rotation
- [x] 3.9 Implement key rotation logic — on rotation, old key remains in JWKS until all tokens signed with it have expired
- [x] 3.10 Write unit tests for OIDCKeyService and IDTokenService — key generation, token signing, JWKS format
- [x] 3.11 Write integration tests — POST /oauth2/token with `openid` scope returns id_token; validate id_token against JWKS; GET /agent-info returns correct claims
- [x] 3.12 QA sign-off: OIDC discovery document passes conformance checks, id_token verifiable, `alg: none` rejected, zero `any`, >80% coverage
---
## Workstream 4: AGNTCY Federation
- [x] 4.1 Write `src/db/migrations/015_create_federation_partners_table.sql` — federation_partners table with issuer, jwks_uri, allowed_organizations JSONB, status, expires_at
- [x] 4.2 Write `src/types/federation.ts` — IFederationPartner, ICreatePartnerRequest, IVerifyFederatedTokenRequest, IFederationVerifyResult interfaces
- [x] 4.3 Write `src/services/FederationService.ts` — registerPartner (validates by fetching JWKS), listPartners, deletePartner, verifyFederatedToken (fetch-or-cache JWKS, verify signature, validate claims)
- [x] 4.4 Implement JWKS caching in FederationService — store partner JWKS in Redis with TTL configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS
- [x] 4.5 Write `src/controllers/FederationController.ts` — handlers for POST /federation/trust, GET /federation/partners, DELETE /federation/partners/:id, POST /federation/verify
- [x] 4.6 Write `src/routes/federation.ts` — mount all 4 federation endpoints
- [x] 4.7 Implement partner expiry check — partners past `expires_at` are treated as status `expired`; their tokens rejected
- [x] 4.8 Implement `allowedOrganizations` filter — reject tokens whose `organization_id` is not in the allow list (if list is non-empty)
- [x] 4.9 Write unit tests for FederationService — trust registration, token verification (valid/expired/untrusted/tampered), JWKS cache behavior
- [x] 4.10 Write integration tests — end-to-end: register partner, verify a valid token from that partner, verify rejection for unknown issuer
- [x] 4.11 QA sign-off: tampered token rejected, expired partner rejected, JWKS cache verified, zero `any`, >80% coverage
---
## Workstream 5: Webhooks and Event Streaming
- [x] 5.1 Write `src/db/migrations/016_create_webhook_subscriptions_table.sql` — webhook_subscriptions with url, events JSONB, secret_hash, vault_secret_path, active, failure_count
- [x] 5.2 Write `src/db/migrations/017_create_webhook_deliveries_table.sql` — webhook_deliveries with status, http_status_code, attempt_count, next_retry_at
- [x] 5.3 Write `src/types/webhook.ts` — IWebhookSubscription, ICreateWebhookRequest, IWebhookDelivery, IWebhookPayload, WebhookEventType interfaces
- [x] 5.4 Write `src/services/WebhookService.ts` — createSubscription (store secret in Vault), listSubscriptions, getSubscription, updateSubscription, deleteSubscription, listDeliveries
- [x] 5.5 Write `src/workers/WebhookDeliveryWorker.ts` — bull queue worker: fetch subscription, compute HMAC-SHA256 signature, POST to URL with headers, update delivery status, schedule retry on failure
- [x] 5.6 Write `src/services/EventPublisher.ts` — buildEventPayload, publishEvent (enqueues to bull queue; also produces to Kafka if KAFKA_BROKERS is set)
- [x] 5.7 Update `src/services/AgentService.ts` — call EventPublisher.publishEvent for: agent.created, agent.updated, agent.suspended, agent.reactivated, agent.decommissioned
- [x] 5.8 Update `src/services/CredentialService.ts` — call EventPublisher.publishEvent for: credential.generated, credential.rotated, credential.revoked
- [x] 5.9 Update `src/services/OAuth2Service.ts` — call EventPublisher.publishEvent for: token.issued, token.revoked
- [x] 5.10 Write `src/controllers/WebhookController.ts` — handlers for all 6 webhook endpoints
- [x] 5.11 Write `src/routes/webhooks.ts` — mount all 6 webhook endpoints with correct scope guards
- [x] 5.12 Implement SSRF protection in WebhookDeliveryWorker — reject delivery to RFC 1918 addresses, loopback, and link-local ranges
- [x] 5.13 Implement dead-letter handling — after max retries, set status to dead_letter and increment `agentidp_webhook_dead_letters_total` Prometheus metric
- [x] 5.14 Write `src/adapters/KafkaAdapter.ts` — optional Kafka producer; activated only when KAFKA_BROKERS env var is set
- [x] 5.15 Write unit tests for WebhookService, WebhookDeliveryWorker, EventPublisher — HMAC computation, retry schedule, dead-letter logic
- [x] 5.16 Write integration tests — create subscription, trigger an event, verify delivery; verify SSRF rejection; verify retry on 5xx response
- [x] 5.17 QA sign-off: HMAC verifiable, SSRF protection active, retry schedule correct, dead-letter metric fires, zero `any`, >80% coverage
---
## Workstream 6: SOC 2 Type II Preparation
- [x] 6.1 Enable `pgcrypto` PostgreSQL extension in `src/db/migrations/018_enable_pgcrypto.sql`
- [x] 6.2 Write `src/services/EncryptionService.ts` — AES-256-CBC encrypt/decrypt using key from Vault; methods: encryptColumn, decryptColumn, isEncrypted
- [x] 6.3 Write `src/db/migrations/019_encrypt_sensitive_columns.sql` — re-encrypt existing credentials.secret_hash and credentials.vault_path values using EncryptionService (migration script)
- [x] 6.4 Update `src/services/CredentialService.ts` — all reads/writes of secret_hash and vault_path go through EncryptionService
- [x] 6.5 Update `src/services/WebhookService.ts` — vault_secret_path column encrypted via EncryptionService
- [x] 6.6 Update `src/services/DIDService.ts` — vault_key_path in agent_did_keys encrypted via EncryptionService
- [x] 6.7 Write `src/middleware/TLSEnforcementMiddleware.ts` — redirect HTTP to HTTPS in production using X-Forwarded-Proto header; passthrough in development
- [x] 6.8 Register TLSEnforcementMiddleware in `src/app.ts` — first in middleware stack
- [x] 6.9 Write `src/db/migrations/020_add_audit_chain_columns.sql` — add hash and previous_hash columns to audit_logs; add immutability trigger; backfill chain for existing rows
- [x] 6.10 Update `src/services/AuditService.ts` — compute Merkle hash on every insert: hash = SHA-256(eventId + timestamp + action + outcome + agentId + organizationId + previousHash)
- [x] 6.11 Write `src/services/AuditVerificationService.ts` — verifyChain(fromDate?, toDate?): reads rows in order, recomputes hashes, returns IChainVerificationResult
- [x] 6.12 Write `src/jobs/SecretsRotationJob.ts` — cron job: identify expiring credentials, emit `agentidp_credentials_expiring_soon_total` metric, renew Vault leases
- [x] 6.13 Write `src/jobs/AuditChainVerificationJob.ts` — cron job: runs verifyChain on a schedule, sets `agentidp_audit_chain_integrity` Prometheus gauge to 1 (pass) or 0 (fail)
- [x] 6.14 Write `src/controllers/ComplianceController.ts` — handlers for GET /audit/verify and GET /compliance/controls
- [x] 6.15 Write `src/routes/compliance.ts` — mount /audit/verify (rate-limited) and /compliance/controls
- [x] 6.16 Write `monitoring/prometheus/alerts.yml` — all 6 alerting rules: AuthFailureSpike, RateLimitExhaustion, AnomalousTokenIssuance, WebhookDeadLetterAccumulating, AuditChainIntegrityFailed, CredentialExpiryApproaching
- [x] 6.17 Update `monitoring/prometheus/prometheus.yml` — add alerting rules file reference
- [x] 6.18 Write compliance documentation package: `docs/compliance/soc2-controls-matrix.md` (Trust Services Criteria → controls map), `docs/compliance/encryption-runbook.md` (key rotation procedure), `docs/compliance/audit-log-runbook.md` (chain verification guide)
- [x] 6.19 Write operational runbooks: `docs/compliance/incident-response.md` (security event procedures), `docs/compliance/secrets-rotation.md` (credential and signing key rotation guide)
- [x] 6.20 Write unit tests for EncryptionService (encrypt/decrypt round-trip, Vault key fetch) and AuditVerificationService (intact chain, tampered chain with correct brokenAtEventId)
- [x] 6.21 Write integration tests — TLS enforcement verified, encrypted columns not plaintext-readable in direct DB query, chain verification returns correct results
- [x] 6.22 QA sign-off: all 5 controls pass GET /compliance/controls, all 6 Prometheus alerts valid, zero `any`, >80% coverage
---
## Phase 3 Complete Criteria
All 6 workstreams done. All tasks checked. All QA gates passed. CEO reviewed. SOC 2 audit window begins.