feat(openspec): Phase 3 Enterprise — proposal, design, specs, and tasks
Scaffolds the phase-3-enterprise OpenSpec change (proposal only — awaiting CEO approval before implementation). 6 workstreams, 95 implementation tasks: WS1: Multi-Tenancy (21 tasks) — org model, RLS, admin API WS2: W3C DIDs (12 tasks) — DID:WEB, agent DID documents, AGNTCY cards WS3: OIDC (12 tasks) — oidc-provider, ID tokens, JWKS, discovery WS4: Federation (11 tasks) — cross-instance trust, JWT assertions WS5: Webhooks (17 tasks) — subscriptions, Bull queue, HMAC, retry WS6: SOC2 (22 tasks) — encryption at rest, Merkle audit chain, controls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2
openspec/changes/phase-3-enterprise/.openspec.yaml
Normal file
2
openspec/changes/phase-3-enterprise/.openspec.yaml
Normal file
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-29
|
||||
269
openspec/changes/phase-3-enterprise/design.md
Normal file
269
openspec/changes/phase-3-enterprise/design.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Phase 3: Enterprise — Technical Design
|
||||
|
||||
**Date**: 2026-03-29
|
||||
**Author**: Virtual Architect
|
||||
**Status**: Draft — pending CEO approval of proposal
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Phase 3 transforms AgentIdP from a single-tenant OAuth 2.0 server into a multi-tenant, W3C DID-issuing, OIDC-compliant, federated enterprise identity platform. The architecture remains monolithic Express (no microservices split) to avoid operational complexity, but clear service boundaries are enforced internally.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ AgentIdP Server (Express) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Middleware Stack (ordered) │ │
|
||||
│ │ TLS Enforcement → Auth → Org Context → OPA │ │
|
||||
│ └────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────┐ │
|
||||
│ │ OrgSvc │ │ DIDSvc │ │OIDCSvc │ │FedSvc │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └───────┘ │
|
||||
│ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ WebhookQ │ │ SOC2Ctrl │ │
|
||||
│ └──────────┘ └──────────┘ │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
┌────────▼──┐ ┌─────▼───┐ ┌──▼──────────┐
|
||||
│PostgreSQL │ │ Redis │ │ Vault │
|
||||
│(org rows) │ │(webhook │ │ (secrets) │
|
||||
└───────────┘ │ queue) │ └─────────────┘
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architectural Decision Records
|
||||
|
||||
---
|
||||
|
||||
### D1: Multi-Tenancy Model
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Row-level tenancy — add `organization_id` (UUID, NOT NULL) to every domain table. No schema-per-tenant, no database-per-tenant.
|
||||
|
||||
**Rationale**: Row-level tenancy is operationally the simplest approach: a single database, a single schema, a single connection pool. All queries are augmented with an `organization_id` filter extracted from the authenticated JWT. PostgreSQL Row-Level Security (RLS) is enabled on all tenant-scoped tables as a defense-in-depth measure — even if the application filter is accidentally omitted, the database enforces isolation.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Schema-per-tenant | Strong isolation, independent migrations | Complex migration tooling, connection pool explosion at scale | Operational overhead exceeds threat model requirement |
|
||||
| Database-per-tenant | Maximum isolation | Separate connection pool, backup, monitoring per tenant | Prohibitive at 100+ orgs; overkill for our threat model |
|
||||
| Row-level (chosen) | Simple, fast, single migration path | RLS must be enforced consistently | Chosen — enforce via both application and RLS |
|
||||
|
||||
**Consequences**:
|
||||
- Every domain table gets an `organization_id` column and a corresponding index
|
||||
- All service methods accept `organizationId: string` as a required parameter
|
||||
- JWT payload extended to include `organization_id` claim
|
||||
- Existing single-tenant data migrated to a default `system` organization
|
||||
- PostgreSQL RLS policies written for all tenant tables
|
||||
|
||||
---
|
||||
|
||||
### D2: DID Method Selection
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: `did:web` — DID Documents served over HTTPS at well-known and per-agent URLs.
|
||||
|
||||
**Rationale**: `did:web` requires no blockchain, no ledger, and no external infrastructure beyond the HTTPS server already running. It is W3C DID Core 1.0 compliant, supported by all major DID resolvers, and is the preferred method for enterprise deployments where an organization controls its own domain. It aligns directly with the `did:web` identifier scheme used in AGNTCY agent card specifications.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| `did:web` (chosen) | No blockchain, HTTPS-based, enterprise-friendly | DID tied to domain; moving the domain invalidates DIDs | Accepted tradeoff — enterprise deployments have stable domains |
|
||||
| `did:key` | Self-contained, no infrastructure | Not anchored — anyone can generate any `did:key`; no discovery | No trust anchor; not suitable for enterprise identity |
|
||||
| `did:ethr` | Ethereum-anchored, decentralized | Blockchain dependency, gas costs, not enterprise-standard | Blockchain dependency is a non-starter for regulated enterprises |
|
||||
|
||||
**Consequences**:
|
||||
- DID for the AgentIdP instance: `did:web:<hostname>`
|
||||
- DID for an agent: `did:web:<hostname>:agents:<agentId>`
|
||||
- DID Documents served at `/.well-known/did.json` and `/agents/:id/did`
|
||||
- Domain change requires DID migration — document this in ops runbook
|
||||
|
||||
---
|
||||
|
||||
### D3: OIDC Library Selection
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: `oidc-provider` npm package — a certified, RFC-compliant OIDC server library.
|
||||
|
||||
**Rationale**: `oidc-provider` is the most widely deployed Node.js OIDC library, passing the OpenID Foundation's official conformance test suite. Building OIDC from scratch on top of our existing JWT infrastructure would require implementing Discovery, JWKS rotation, ID token construction, and claim aggregation correctly against multiple RFCs. The certified library eliminates that risk and reduces implementation surface area. It integrates cleanly with Express as a mounted middleware.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| `oidc-provider` (chosen) | Certified, RFC-complete, actively maintained | Adds a significant dependency | Risk of non-compliance from custom implementation outweighs dependency cost |
|
||||
| Custom JWT extension | Full control, no new dependency | High risk of spec deviation; ID token, Discovery, JWKS are complex | RFC compliance cannot be self-certified |
|
||||
| `keycloak` sidecar | Battle-tested, full-featured | Heavyweight Java service; architectural mismatch | Not Node.js; adds operational complexity |
|
||||
|
||||
**Consequences**:
|
||||
- `oidc-provider` is mounted at `/oidc` in Express
|
||||
- OIDC Discovery served at `/.well-known/openid-configuration` (proxied from oidc-provider)
|
||||
- JWKS served at `/.well-known/jwks.json`
|
||||
- Adapter written to store OIDC sessions in Redis (oidc-provider's adapter interface)
|
||||
- Existing `POST /oauth2/token` route extended, not replaced — maintains backward compatibility
|
||||
|
||||
---
|
||||
|
||||
### D4: Federation Protocol
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Signed JWT assertions — remote AgentIdP instances present a signed JWT; the receiving instance verifies the signature against the registered JWKS of the issuing instance.
|
||||
|
||||
**Rationale**: JWT assertion federation reuses the existing JWT infrastructure (`jsonwebtoken`, JWKS endpoint from OIDC workstream). No new protocol is introduced. The trust model is explicit: operators register partner instances with their JWKS URL. This aligns with RFC 7523 (JWT Profile for OAuth 2.0 Client Authentication) and the AGNTCY inter-agent trust model.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Signed JWT assertions (chosen) | Uses existing JWT infra, explicit trust registry, RFC-aligned | JWKS URL must be reachable at verification time | Acceptable operational constraint; JWKS can be cached |
|
||||
| mTLS | Strong cryptographic identity | Certificate management overhead, PKI required per partner | Cert management complexity not justified when JWT assertions suffice |
|
||||
| AGNTCY-specific protocol | Native alignment | Spec still evolving; risk of churn | Build on stable JWT base; adapt to AGNTCY extensions as spec matures |
|
||||
|
||||
**Consequences**:
|
||||
- New `federation_partners` table: `id`, `name`, `jwks_url`, `issuer`, `trusted_since`, `organization_id`
|
||||
- JWKS of partner instances cached in Redis with TTL
|
||||
- `POST /federation/verify` accepts a bearer token from a remote instance and returns verification result
|
||||
- Federation tokens are not accepted for agent management endpoints — only for identity assertion
|
||||
|
||||
---
|
||||
|
||||
### D5: Webhook Delivery Architecture
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Async delivery via Redis-backed `bull` queue with exponential backoff retry (max 10 attempts over 24 hours).
|
||||
|
||||
**Rationale**: Synchronous webhook delivery from within a request handler would add latency and create tight coupling between event generation and delivery outcome. The Redis queue (`bull`) decouples delivery: events are enqueued immediately, a background worker delivers them. `bull` provides built-in retry, delay, and failure tracking without introducing a new infrastructure component (Redis is already present). HMAC-SHA256 signing on every delivery allows recipients to verify authenticity.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Redis queue via `bull` (chosen) | Reuses existing Redis, retry built-in, low operational overhead | Delivery tied to Redis availability | Acceptable — Redis is already a required dependency |
|
||||
| Synchronous in-request delivery | Simplest implementation | Adds latency to event-generating requests; failure blocks response | Unacceptable latency and coupling |
|
||||
| Dedicated message broker (RabbitMQ) | Robust, durable | New infrastructure dependency | Operational overhead; Redis already present |
|
||||
| Kafka (primary) | High-throughput, durable | Overkill for webhook delivery; complex operations | Optional adapter only; not primary delivery mechanism |
|
||||
|
||||
**Consequences**:
|
||||
- New `webhook_subscriptions` and `webhook_deliveries` tables
|
||||
- `bull` worker process runs in same Node.js instance (separate worker thread via `bull`)
|
||||
- Retry schedule: 1m, 5m, 15m, 1h, 4h, 12h, 24h (exponential backoff)
|
||||
- Failed delivery after 10 attempts moves to dead-letter; operator alerted
|
||||
- Optional Kafka adapter: if `KAFKA_BROKERS` env var is set, events are also produced to Kafka
|
||||
|
||||
---
|
||||
|
||||
### D6: SOC 2 Scope
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Target SOC 2 Type II (operational, not just design). All controls implemented in code. Audit period: 6 months post-Phase 3 launch.
|
||||
|
||||
**Rationale**: SOC 2 Type I certifies that controls are designed correctly. SOC 2 Type II certifies that they operate continuously over a period of time. Enterprise customers in regulated industries (finance, healthcare, government) require Type II. Implementing the controls now, with the 6-month operational window beginning at Phase 3 launch, puts us on the fastest possible path to Type II certification.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Type II from launch (chosen) | Satisfies enterprise requirements | Requires 6-month operation window | Accepted — the controls are implemented in Phase 3; audit window starts after launch |
|
||||
| Type I only | Faster to certify | Not accepted by most enterprise procurement | Insufficient for target customers |
|
||||
| ISO 27001 instead | International standard | Larger scope, longer implementation | SOC 2 is standard for US market; add ISO 27001 in Phase 4 |
|
||||
|
||||
**Consequences**:
|
||||
- Encryption at rest: `pgcrypto` extension for column-level encryption on `credentials.secret_hash` and `credentials.vault_path`
|
||||
- TLS enforcement: Express middleware rejects HTTP requests (not HTTPS) in production
|
||||
- Secrets rotation: cron-based job that triggers credential rotation reminders and Vault lease renewals
|
||||
- Security alerting: Prometheus alerting rules for auth failure spikes, rate limit exhaustion, anomalous token issuance
|
||||
- Audit log immutability: Merkle hash chain (each row's hash includes the previous row's hash)
|
||||
|
||||
---
|
||||
|
||||
### D7: Audit Log Immutability — Merkle Hash Chain
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Each `audit_logs` row carries a `hash` field: `SHA-256(eventId + timestamp + action + outcome + agentId + previousHash)`. The chain starts with a genesis hash. Verification is a sequential pass over all rows in insertion order.
|
||||
|
||||
**Rationale**: Append-only logs in PostgreSQL can be altered by a DBA with sufficient access. A Merkle-style hash chain makes tampering detectable without requiring a blockchain. Any modification to a historical row breaks the chain from that point forward. Verification is a simple sequential computation that can be run on demand or as a scheduled integrity check.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Merkle hash chain in PostgreSQL (chosen) | No new infra, tamper-evident, verifiable | DBA can re-compute hashes after tampering if they control the algorithm | Acceptable — threat model is accidental/low-sophistication modification; cryptographic chain deters opportunistic tampering |
|
||||
| Blockchain anchor | Cryptographically immutable | Blockchain dependency, cost, latency | Excessive for current threat model |
|
||||
| Write-once S3/GCS export | External immutability | Delayed; operational complexity | Added complexity; hash chain provides continuous coverage |
|
||||
|
||||
**Consequences**:
|
||||
- New `hash` (VARCHAR 64) and `previous_hash` (VARCHAR 64) columns on `audit_logs`
|
||||
- `AuditService.create()` computes hash before insert — adds ~1ms latency per audit event
|
||||
- New `GET /audit/verify` endpoint: returns chain integrity status (admin only)
|
||||
- `audit_logs` table has an `INSERT`-only trigger that prevents `UPDATE` and `DELETE` via PostgreSQL trigger
|
||||
|
||||
---
|
||||
|
||||
### D8: Organization Context in JWT
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Add `organization_id` claim to JWT access tokens issued by `POST /oauth2/token`. All downstream middleware extracts `organization_id` from the token — no separate lookup required.
|
||||
|
||||
**Rationale**: Including `organization_id` in the JWT keeps the middleware stack stateless. The alternative — looking up the organization from the database on every request — adds latency and a database round-trip to every authenticated call. The JWT is already signed; adding a claim costs nothing cryptographically.
|
||||
|
||||
**Consequences**:
|
||||
- `ITokenPayload` interface extended: `organization_id: string`
|
||||
- All service methods receive `organizationId` from `req.user.organization_id`
|
||||
- Token introspection response includes `organization_id`
|
||||
- Agents registered before multi-tenancy belong to the default `system` organization
|
||||
|
||||
---
|
||||
|
||||
## Component Interaction Map (Phase 3)
|
||||
|
||||
```
|
||||
┌──────────────────────┐
|
||||
│ Web Dashboard │
|
||||
│ (+ Org Mgmt pages) │
|
||||
└──────────┬───────────┘
|
||||
│ HTTPS
|
||||
┌───────────────────────▼─────────────────────────────┐
|
||||
│ AgentIdP Server │
|
||||
│ │
|
||||
│ TLS MW → Auth MW → OrgContext MW → OPA MW │
|
||||
│ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
|
||||
│ │ OrgService│ │DIDService │ │ OIDCProvider │ │
|
||||
│ └───────────┘ └───────────┘ │ (oidc-provider) │ │
|
||||
│ ┌───────────┐ ┌───────────┐ └───────────────────┘ │
|
||||
│ │ FedService│ │WebhookSvc │ │
|
||||
│ └───────────┘ └───────────┘ │
|
||||
│ ┌─────────────────────────┐ │
|
||||
│ │ SOC2Controls (cross-cut)│ │
|
||||
│ └─────────────────────────┘ │
|
||||
└──────────┬──────────────┬──────────────┬─────────────┘
|
||||
│ │ │
|
||||
┌────────▼──┐ ┌───────▼──┐ ┌──────▼──────┐
|
||||
│PostgreSQL │ │ Redis │ │ Vault │
|
||||
│ + RLS │ │ +bull Q │ │ (secrets) │
|
||||
└───────────┘ └──────────┘ └─────────────┘
|
||||
│
|
||||
┌────────▼──────┐
|
||||
│ Prometheus │
|
||||
│ + Alerting │
|
||||
└────────┬──────┘
|
||||
│
|
||||
┌────────▼──────┐
|
||||
│ Grafana │
|
||||
└───────────────┘
|
||||
```
|
||||
165
openspec/changes/phase-3-enterprise/proposal.md
Normal file
165
openspec/changes/phase-3-enterprise/proposal.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Phase 3: Enterprise — Change Proposal
|
||||
|
||||
**Date**: 2026-03-29
|
||||
**Author**: Virtual Architect
|
||||
**Status**: Proposed — awaiting CEO approval
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 made it production-ready: Vault-backed secrets, multi-language SDKs, OPA policy engine, React dashboard, Prometheus/Grafana observability, and multi-region Terraform deployment. Phase 3 makes AgentIdP enterprise-grade: the platform moves from a single-tenant developer tool to a multi-tenant enterprise identity platform with W3C DID support, OIDC compliance, AGNTCY federation, real-time event streaming, and SOC 2 Type II controls.
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Phase 1 and Phase 2 are functional and production-ready but have the following enterprise gaps:
|
||||
|
||||
| Gap | Risk |
|
||||
|-----|------|
|
||||
| Single-tenant architecture | Cannot serve enterprise customers with isolated data requirements |
|
||||
| No W3C DID support | Not fully AGNTCY-compliant; agents lack interoperable decentralized identifiers |
|
||||
| OAuth 2.0 only, no OIDC | Cannot integrate with standard enterprise identity ecosystems (SSO, SCIM) |
|
||||
| No cross-instance federation | Multi-organization agent identity cannot be verified across AgentIdP deployments |
|
||||
| No webhook/event streaming | Operators cannot react to agent lifecycle events in real time |
|
||||
| No SOC 2 controls | Cannot pass enterprise security reviews; blocks revenue from regulated industries |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### 1. Multi-Tenancy
|
||||
Introduce an Organization model so a single AgentIdP instance can serve multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit log, and rate limits. A new Admin API provides organization lifecycle management. All existing agent, credential, and audit endpoints become organization-scoped.
|
||||
|
||||
### 2. W3C Decentralized Identifiers (DIDs)
|
||||
Issue a W3C `did:web` identifier for every registered agent. Serve DID Documents at `/.well-known/did.json` (instance root) and `/agents/:id/did` (per-agent). Expose a DID resolution endpoint. Produce AGNTCY-format agent cards from DID Documents.
|
||||
|
||||
### 3. AGNTCY Federation
|
||||
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted federation partners. Tokens issued by a trusted remote AgentIdP instance can be verified locally, enabling multi-organization and cross-enterprise agent identity interoperability aligned with AGNTCY standards.
|
||||
|
||||
### 4. OpenID Connect (OIDC)
|
||||
Add a full OIDC layer on top of the existing OAuth 2.0 implementation using the `oidc-provider` certified library. Exposes OIDC Discovery, JWKS, ID tokens with agent claims, and an `/agent-info` endpoint (the agent-identity equivalent of the OIDC `/userinfo` endpoint).
|
||||
|
||||
### 5. Webhooks and Event Streaming
|
||||
Real-time event notifications for all agent lifecycle events: agent created, suspended, revoked, credential rotated, token issued. Operators create webhook subscriptions with HMAC-SHA256 signing. Delivery is async via a Redis-backed queue with exponential backoff retry. An optional Kafka/NATS adapter is available for high-throughput environments.
|
||||
|
||||
### 6. SOC 2 Type II Preparation
|
||||
Implement the technical controls required for SOC 2 Type II audit: encryption at rest via PostgreSQL column-level encryption for secrets, TLS enforcement on all inbound connections, automated secrets rotation, security event alerting via Prometheus alerting rules, and audit log immutability proof using a Merkle hash chain appended to each `audit_logs` row.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope for Phase 3
|
||||
|
||||
- Rust/C++ SDKs (Phase 4)
|
||||
- Azure Terraform module (Phase 4)
|
||||
- SCIM provisioning (Phase 4)
|
||||
- End-user (human operator) identity management (out of product scope — AgentIdP is agent-first)
|
||||
|
||||
---
|
||||
|
||||
## Capabilities Table
|
||||
|
||||
### New Capabilities
|
||||
|
||||
| Workstream | Capability | Type |
|
||||
|-----------|-----------|------|
|
||||
| Multi-Tenancy | Organization model with isolated agent namespaces | New |
|
||||
| Multi-Tenancy | Admin API: create, list, update, delete organizations | New |
|
||||
| Multi-Tenancy | Per-organization rate limits and audit logs | New |
|
||||
| Multi-Tenancy | Organization member management | New |
|
||||
| W3C DIDs | `did:web` identifier on every registered agent | New |
|
||||
| W3C DIDs | DID Document endpoint per agent | New |
|
||||
| W3C DIDs | Instance-level root DID Document | New |
|
||||
| W3C DIDs | DID resolution endpoint | New |
|
||||
| W3C DIDs | AGNTCY-format agent card from DID Document | New |
|
||||
| OIDC | OIDC Discovery endpoint (`/.well-known/openid-configuration`) | New |
|
||||
| OIDC | JWKS endpoint (`/.well-known/jwks.json`) | New |
|
||||
| OIDC | ID token with agent claims in token response | Modified |
|
||||
| OIDC | `/agent-info` endpoint (agent claims) | New |
|
||||
| Federation | Trust registry: register and list federation partners | New |
|
||||
| Federation | Cross-instance token verification endpoint | New |
|
||||
| Federation | Signed JWT assertion inter-IdP protocol | New |
|
||||
| Webhooks | Webhook subscription management (CRUD) | New |
|
||||
| Webhooks | HMAC-SHA256 signed delivery with retry | New |
|
||||
| Webhooks | Delivery history log | New |
|
||||
| Webhooks | Kafka/NATS adapter (optional) | New |
|
||||
| SOC 2 | PostgreSQL column-level encryption for secrets at rest | New |
|
||||
| SOC 2 | TLS enforcement middleware (reject non-TLS) | New |
|
||||
| SOC 2 | Automated secrets rotation schedule | New |
|
||||
| SOC 2 | Security event alerting (Prometheus alerting rules) | New |
|
||||
| SOC 2 | Merkle hash chain on `audit_logs` for immutability proof | New |
|
||||
| SOC 2 | Compliance documentation (controls matrix, runbook) | New |
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
| Workstream | Capability | Change |
|
||||
|-----------|-----------|--------|
|
||||
| Multi-Tenancy | `POST /agents` | Now scoped to `organizationId` |
|
||||
| Multi-Tenancy | `GET /agents` | Filters restricted to caller's organization |
|
||||
| Multi-Tenancy | `GET /audit` | Restricted to caller's organization by default |
|
||||
| Multi-Tenancy | Rate limiting | Per-organization limits in addition to global |
|
||||
| OIDC | `POST /oauth2/token` | Returns `id_token` in addition to `access_token` |
|
||||
| SOC 2 | Audit log write path | Computes and appends Merkle hash on insert |
|
||||
|
||||
---
|
||||
|
||||
## Repository Impact
|
||||
|
||||
| Area | Impact |
|
||||
|------|--------|
|
||||
| `src/` | New services: OrgService, DIDService, OIDCService, FederationService, WebhookService, SOC2Controls |
|
||||
| `src/db/migrations/` | 8–10 new migration files |
|
||||
| `src/types/index.ts` | ~80 new interfaces/types |
|
||||
| `src/middleware/` | New TLS enforcement middleware, updated auth middleware for org context |
|
||||
| `src/routes/` | 6 new route files |
|
||||
| `/.well-known/` | 3 new well-known endpoints |
|
||||
| `policies/` | Updated Rego policies for org-scoped permissions |
|
||||
| `dashboard/` | New Organization management pages |
|
||||
| `monitoring/` | New alerting rules for SOC 2 security events |
|
||||
| `docs/` | Compliance documentation, federation setup guide, webhook integration guide |
|
||||
|
||||
---
|
||||
|
||||
## New Dependencies
|
||||
|
||||
| Workstream | Package | Purpose | CEO Approval Required |
|
||||
|-----------|---------|---------|----------------------|
|
||||
| Multi-Tenancy | No new packages — row-level tenancy in existing PostgreSQL | — | No |
|
||||
| W3C DIDs | `did-resolver` | W3C DID resolution | Yes |
|
||||
| W3C DIDs | `web-did-resolver` | DID:WEB method resolver | Yes |
|
||||
| OIDC | `oidc-provider` | Certified OIDC server library | Yes |
|
||||
| Federation | No new packages — signed JWT assertions use existing `jsonwebtoken` | — | No |
|
||||
| Webhooks | `bull` (Redis-backed queue) | Async webhook delivery queue | Yes |
|
||||
| Webhooks | `kafkajs` (optional, Kafka adapter) | Kafka event streaming | Yes |
|
||||
| SOC 2 | `node-forge` | Column-level encryption primitives | Yes |
|
||||
|
||||
---
|
||||
|
||||
## Delivery Sequence
|
||||
|
||||
Multi-tenancy is a prerequisite for all enterprise customer work — it must land first. DID support and OIDC are independent and can proceed in parallel. Federation depends on DIDs being in place. Webhooks are standalone. SOC 2 controls cut across the entire codebase and are implemented last to ensure all features they protect are already present.
|
||||
|
||||
```
|
||||
1. Multi-Tenancy (prerequisite — all enterprise features assume org context)
|
||||
2. W3C DIDs (parallel)
|
||||
OIDC (parallel)
|
||||
3. Federation (depends on DIDs)
|
||||
4. Webhooks (standalone)
|
||||
5. SOC 2 (cuts across all workstreams — implemented after all features are stable)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- All new dependencies CEO-approved before implementation begins
|
||||
- All new API endpoints have OpenAPI 3.0 specs before implementation
|
||||
- Multi-tenancy isolation verified: no cross-organization data leakage
|
||||
- DID Documents are W3C DID Core 1.0 compliant and resolve correctly
|
||||
- OIDC Discovery passes `oidc-provider` conformance test suite
|
||||
- Federation token verification rejects tampered assertions
|
||||
- Webhook delivery achieves >99.9% success rate with retry logic
|
||||
- SOC 2 controls pass independent technical review
|
||||
- TypeScript strict mode + zero `any` maintained throughout
|
||||
- >80% test coverage on all new services
|
||||
370
openspec/changes/phase-3-enterprise/specs/federation/spec.md
Normal file
370
openspec/changes/phase-3-enterprise/specs/federation/spec.md
Normal file
@@ -0,0 +1,370 @@
|
||||
# AGNTCY Federation — Specification
|
||||
|
||||
**Workstream**: 4 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
|
||||
|
||||
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /federation/trust
|
||||
|
||||
Register a new federation trust partner. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
POST /federation/trust
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [name, issuer, jwksUri]
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
description: Human-readable name for this federation partner
|
||||
example: "Contoso AgentIdP"
|
||||
issuer:
|
||||
type: string
|
||||
format: uri
|
||||
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
|
||||
example: "https://agentidp.contoso.com"
|
||||
jwksUri:
|
||||
type: string
|
||||
format: uri
|
||||
description: URL of the partner's JWKS endpoint
|
||||
example: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
allowedOrganizations:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
|
||||
example: ["org_contoso_engineering"]
|
||||
expiresAt:
|
||||
type: string
|
||||
format: date-time
|
||||
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/FederationPartner'
|
||||
example:
|
||||
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
status: "active"
|
||||
allowedOrganizations: []
|
||||
trustedSince: "2026-03-29T12:00:00Z"
|
||||
expiresAt: null
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
examples:
|
||||
duplicate_issuer:
|
||||
code: "DUPLICATE_ISSUER"
|
||||
message: "A trust relationship with this issuer already exists"
|
||||
unreachable_jwks:
|
||||
code: "JWKS_UNREACHABLE"
|
||||
message: "Could not fetch JWKS from the provided jwksUri"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /federation/partners
|
||||
|
||||
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
GET /federation/partners
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended, expired]
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 20
|
||||
maximum: 100
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/FederationPartner'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
status: "active"
|
||||
trustedSince: "2026-03-29T12:00:00Z"
|
||||
expiresAt: null
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 20
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /federation/partners/:partnerId
|
||||
|
||||
Remove a federation trust relationship. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
DELETE /federation/partners/{partnerId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Path Parameters:
|
||||
partnerId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /federation/verify
|
||||
|
||||
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
|
||||
|
||||
```yaml
|
||||
POST /federation/verify
|
||||
Authorization: Bearer <local access_token with agents:read scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [token]
|
||||
properties:
|
||||
token:
|
||||
type: string
|
||||
description: The JWT token issued by the remote AgentIdP instance to verify
|
||||
expectedIssuer:
|
||||
type: string
|
||||
format: uri
|
||||
description: Optional — if provided, verification fails if token issuer does not match
|
||||
expectedOrganizationId:
|
||||
type: string
|
||||
description: Optional — if provided, verification fails if token organization_id does not match
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
claims:
|
||||
type: object
|
||||
description: Decoded JWT claims from the verified token
|
||||
properties:
|
||||
sub:
|
||||
type: string
|
||||
iss:
|
||||
type: string
|
||||
iat:
|
||||
type: integer
|
||||
exp:
|
||||
type: integer
|
||||
agent_id:
|
||||
type: string
|
||||
agent_type:
|
||||
type: string
|
||||
organization_id:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
did:
|
||||
type: string
|
||||
partner:
|
||||
type: object
|
||||
description: The federation partner record that vouches for this token
|
||||
properties:
|
||||
partnerId:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
issuer:
|
||||
type: string
|
||||
example:
|
||||
valid: true
|
||||
claims:
|
||||
sub: "agt_contoso_abc123"
|
||||
iss: "https://agentidp.contoso.com"
|
||||
iat: 1743249600
|
||||
exp: 1743253200
|
||||
agent_id: "agt_contoso_abc123"
|
||||
agent_type: "classifier"
|
||||
organization_id: "org_contoso_engineering"
|
||||
capabilities: ["text-classification"]
|
||||
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
|
||||
partner:
|
||||
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
|
||||
401 Unauthorized (local token invalid):
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
|
||||
422 Unprocessable Entity (token invalid or untrusted issuer):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
example: false
|
||||
reason:
|
||||
type: string
|
||||
enum:
|
||||
- TOKEN_EXPIRED
|
||||
- INVALID_SIGNATURE
|
||||
- UNTRUSTED_ISSUER
|
||||
- JWKS_FETCH_FAILED
|
||||
- ORGANIZATION_NOT_ALLOWED
|
||||
message:
|
||||
type: string
|
||||
example:
|
||||
valid: false
|
||||
reason: "UNTRUSTED_ISSUER"
|
||||
message: "No trust relationship registered for issuer https://unknown.example.com"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: federation_partners
|
||||
|
||||
```sql
|
||||
CREATE TABLE federation_partners (
|
||||
partner_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
name VARCHAR(100) NOT NULL,
|
||||
issuer VARCHAR(255) NOT NULL,
|
||||
jwks_uri VARCHAR(255) NOT NULL,
|
||||
allowed_organizations JSONB NOT NULL DEFAULT '[]',
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'active',
|
||||
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
expires_at TIMESTAMPTZ,
|
||||
last_jwks_fetch TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
|
||||
UNIQUE (organization_id, issuer)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
|
||||
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
|
||||
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
|
||||
```
|
||||
|
||||
### Redis: JWKS Cache
|
||||
|
||||
Partner JWKS documents are cached in Redis with a TTL:
|
||||
|
||||
```
|
||||
Key: federation:jwks:<issuer_url_sha256>
|
||||
Value: JSON string of the JWKS document
|
||||
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
|
||||
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
|
||||
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
|
||||
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
|
||||
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
|
||||
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
|
||||
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
|
||||
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
|
||||
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
|
||||
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
|
||||
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
|
||||
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
|
||||
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
|
||||
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
|
||||
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService
|
||||
444
openspec/changes/phase-3-enterprise/specs/multi-tenancy/spec.md
Normal file
444
openspec/changes/phase-3-enterprise/specs/multi-tenancy/spec.md
Normal file
@@ -0,0 +1,444 @@
|
||||
# Multi-Tenancy — Specification
|
||||
|
||||
**Workstream**: 1 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
|
||||
|
||||
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /organizations
|
||||
|
||||
Create a new organization. Requires system-admin scope (`admin:orgs`).
|
||||
|
||||
```yaml
|
||||
POST /organizations
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [name, slug]
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
description: Display name of the organization
|
||||
example: "Acme AI Platform"
|
||||
slug:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 50
|
||||
pattern: "^[a-z0-9-]+$"
|
||||
description: URL-safe unique identifier
|
||||
example: "acme-ai"
|
||||
planTier:
|
||||
type: string
|
||||
enum: [free, pro, enterprise]
|
||||
default: free
|
||||
maxAgents:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 100
|
||||
maxTokensPerMonth:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 10000
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
example:
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
name: "Acme AI Platform"
|
||||
slug: "acme-ai"
|
||||
planTier: "free"
|
||||
maxAgents: 100
|
||||
maxTokensPerMonth: 10000
|
||||
status: "active"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "slug must be unique"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "INSUFFICIENT_SCOPE"
|
||||
message: "admin:orgs scope required"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /organizations
|
||||
|
||||
List all organizations. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
GET /organizations
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended, deleted]
|
||||
page:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
minimum: 1
|
||||
maximum: 100
|
||||
default: 20
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
name: "Acme AI Platform"
|
||||
slug: "acme-ai"
|
||||
planTier: "free"
|
||||
status: "active"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 20
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /organizations/:orgId
|
||||
|
||||
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
|
||||
|
||||
```yaml
|
||||
GET /organizations/{orgId}
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Path Parameters:
|
||||
orgId:
|
||||
type: string
|
||||
description: Organization ID (org_... prefix)
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ORG_NOT_FOUND"
|
||||
message: "Organization not found"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PATCH /organizations/:orgId
|
||||
|
||||
Partially update an organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
PATCH /organizations/{orgId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
planTier:
|
||||
type: string
|
||||
enum: [free, pro, enterprise]
|
||||
maxAgents:
|
||||
type: integer
|
||||
minimum: 1
|
||||
maxTokensPerMonth:
|
||||
type: integer
|
||||
minimum: 1
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended]
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /organizations/:orgId
|
||||
|
||||
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
|
||||
|
||||
```yaml
|
||||
DELETE /organizations/{orgId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
409 Conflict:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ORG_HAS_ACTIVE_AGENTS"
|
||||
message: "Organization has active agents; decommission all agents before deleting"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /organizations/:orgId/members
|
||||
|
||||
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
POST /organizations/{orgId}/members
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [agentId, role]
|
||||
properties:
|
||||
agentId:
|
||||
type: string
|
||||
description: ID of an already-registered agent to add as a member
|
||||
role:
|
||||
type: string
|
||||
enum: [member, admin]
|
||||
description: Role within the organization
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OrgMember'
|
||||
example:
|
||||
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
role: "member"
|
||||
joinedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
409 Conflict:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ALREADY_MEMBER"
|
||||
message: "Agent is already a member of this organization"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Modified: All /agents, /audit endpoints
|
||||
|
||||
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: organizations
|
||||
|
||||
```sql
|
||||
CREATE TABLE organizations (
|
||||
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
|
||||
name VARCHAR(100) NOT NULL,
|
||||
slug VARCHAR(50) NOT NULL UNIQUE,
|
||||
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
|
||||
max_agents INTEGER NOT NULL DEFAULT 100,
|
||||
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'active',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
|
||||
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_organizations_slug ON organizations(slug);
|
||||
CREATE INDEX idx_organizations_status ON organizations(status);
|
||||
```
|
||||
|
||||
### New Table: organization_members
|
||||
|
||||
```sql
|
||||
CREATE TABLE organization_members (
|
||||
member_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
|
||||
role VARCHAR(20) NOT NULL DEFAULT 'member',
|
||||
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
|
||||
UNIQUE (organization_id, agent_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
|
||||
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
|
||||
```
|
||||
|
||||
### Modified: agents table
|
||||
|
||||
```sql
|
||||
ALTER TABLE agents
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
|
||||
|
||||
-- RLS
|
||||
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY agents_org_isolation ON agents
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Modified: credentials table
|
||||
|
||||
```sql
|
||||
ALTER TABLE credentials
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
|
||||
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY credentials_org_isolation ON credentials
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Modified: audit_logs table
|
||||
|
||||
```sql
|
||||
ALTER TABLE audit_logs
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
|
||||
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY audit_logs_org_isolation ON audit_logs
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Seed: Default system organization
|
||||
|
||||
```sql
|
||||
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
|
||||
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
|
||||
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
|
||||
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
|
||||
- `SET LOCAL app.organization_id` is called at the start of every database transaction
|
||||
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
|
||||
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
|
||||
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
|
||||
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
|
||||
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
|
||||
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
|
||||
- [ ] Pre-existing agents assigned to default system organization without data loss
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService
|
||||
366
openspec/changes/phase-3-enterprise/specs/oidc/spec.md
Normal file
366
openspec/changes/phase-3-enterprise/specs/oidc/spec.md
Normal file
@@ -0,0 +1,366 @@
|
||||
# OpenID Connect (OIDC) — Specification
|
||||
|
||||
**Workstream**: 3 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
|
||||
|
||||
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /.well-known/openid-configuration
|
||||
|
||||
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
|
||||
|
||||
```yaml
|
||||
GET /.well-known/openid-configuration
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: OIDC Discovery document per OpenID Connect Discovery 1.0
|
||||
example:
|
||||
issuer: "https://idp.sentryagent.ai"
|
||||
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
|
||||
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
|
||||
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
|
||||
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
|
||||
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
|
||||
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
|
||||
response_types_supported:
|
||||
- "token"
|
||||
grant_types_supported:
|
||||
- "client_credentials"
|
||||
subject_types_supported:
|
||||
- "public"
|
||||
id_token_signing_alg_values_supported:
|
||||
- "RS256"
|
||||
- "ES256"
|
||||
scopes_supported:
|
||||
- "openid"
|
||||
- "agents:read"
|
||||
- "agents:write"
|
||||
- "tokens:read"
|
||||
- "audit:read"
|
||||
claims_supported:
|
||||
- "sub"
|
||||
- "iss"
|
||||
- "iat"
|
||||
- "exp"
|
||||
- "agent_id"
|
||||
- "agent_type"
|
||||
- "organization_id"
|
||||
- "capabilities"
|
||||
- "deployment_env"
|
||||
- "owner"
|
||||
token_endpoint_auth_methods_supported:
|
||||
- "client_secret_post"
|
||||
- "client_secret_basic"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /.well-known/jwks.json
|
||||
|
||||
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
|
||||
|
||||
```yaml
|
||||
GET /.well-known/jwks.json
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
Cache-Control: public, max-age=3600
|
||||
schema:
|
||||
type: object
|
||||
required: [keys]
|
||||
properties:
|
||||
keys:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
description: JSON Web Key (RFC 7517)
|
||||
properties:
|
||||
kty:
|
||||
type: string
|
||||
example: "RSA"
|
||||
use:
|
||||
type: string
|
||||
example: "sig"
|
||||
kid:
|
||||
type: string
|
||||
description: Key ID — matches `kid` header in issued JWTs
|
||||
alg:
|
||||
type: string
|
||||
example: "RS256"
|
||||
n:
|
||||
type: string
|
||||
description: RSA modulus (base64url)
|
||||
e:
|
||||
type: string
|
||||
description: RSA exponent (base64url)
|
||||
example:
|
||||
keys:
|
||||
- kty: "RSA"
|
||||
use: "sig"
|
||||
kid: "key-2026-03-29-01"
|
||||
alg: "RS256"
|
||||
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
|
||||
e: "AQAB"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /oauth2/token (extended)
|
||||
|
||||
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
|
||||
|
||||
```yaml
|
||||
POST /oauth2/token
|
||||
Content-Type: application/x-www-form-urlencoded
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [grant_type, client_id, client_secret]
|
||||
properties:
|
||||
grant_type:
|
||||
type: string
|
||||
enum: [client_credentials]
|
||||
client_id:
|
||||
type: string
|
||||
client_secret:
|
||||
type: string
|
||||
scope:
|
||||
type: string
|
||||
description: Space-separated scopes. Include "openid" to receive an id_token.
|
||||
example: "openid agents:read"
|
||||
|
||||
Responses:
|
||||
200 OK (with openid scope):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
access_token:
|
||||
type: string
|
||||
token_type:
|
||||
type: string
|
||||
example: "Bearer"
|
||||
expires_in:
|
||||
type: integer
|
||||
scope:
|
||||
type: string
|
||||
id_token:
|
||||
type: string
|
||||
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
|
||||
example:
|
||||
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
token_type: "Bearer"
|
||||
expires_in: 3600
|
||||
scope: "openid agents:read"
|
||||
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
|
||||
|
||||
200 OK (without openid scope):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
access_token:
|
||||
type: string
|
||||
token_type:
|
||||
type: string
|
||||
expires_in:
|
||||
type: integer
|
||||
scope:
|
||||
type: string
|
||||
example:
|
||||
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
token_type: "Bearer"
|
||||
expires_in: 3600
|
||||
scope: "agents:read"
|
||||
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OAuthErrorResponse'
|
||||
example:
|
||||
error: "invalid_client"
|
||||
error_description: "Invalid client credentials"
|
||||
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OAuthErrorResponse'
|
||||
```
|
||||
|
||||
#### ID Token Claims
|
||||
|
||||
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
|
||||
|
||||
```json
|
||||
{
|
||||
"iss": "https://idp.sentryagent.ai",
|
||||
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"iat": 1743249600,
|
||||
"exp": 1743253200,
|
||||
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"agent_type": "orchestrator",
|
||||
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
|
||||
"capabilities": ["task-planning", "tool-use"],
|
||||
"deployment_env": "production",
|
||||
"owner": "acme-ai",
|
||||
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agent-info
|
||||
|
||||
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
|
||||
|
||||
```yaml
|
||||
GET /agent-info
|
||||
Authorization: Bearer <access_token>
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: Agent identity claims (subset of registered agent data)
|
||||
properties:
|
||||
sub:
|
||||
type: string
|
||||
description: Subject — agentId
|
||||
agent_id:
|
||||
type: string
|
||||
agent_type:
|
||||
type: string
|
||||
organization_id:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
deployment_env:
|
||||
type: string
|
||||
owner:
|
||||
type: string
|
||||
version:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
did:
|
||||
type: string
|
||||
description: W3C DID for this agent (if DID workstream is active)
|
||||
created_at:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agent_type: "orchestrator"
|
||||
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
capabilities: ["task-planning", "tool-use"]
|
||||
deployment_env: "production"
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
status: "active"
|
||||
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
created_at: "2026-03-29T12:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "UNAUTHORIZED"
|
||||
message: "Invalid or expired access token"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: oidc_keys
|
||||
|
||||
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
|
||||
|
||||
```sql
|
||||
CREATE TABLE oidc_keys (
|
||||
key_id VARCHAR(40) PRIMARY KEY,
|
||||
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
|
||||
algorithm VARCHAR(10) NOT NULL,
|
||||
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
|
||||
public_key_jwk JSONB NOT NULL,
|
||||
vault_key_path VARCHAR(255) NOT NULL,
|
||||
is_current BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
retired_at TIMESTAMPTZ,
|
||||
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
|
||||
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
|
||||
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
|
||||
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
|
||||
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
|
||||
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- ID token signing keys are stored in Vault; public keys only are served via JWKS
|
||||
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
|
||||
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
|
||||
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
|
||||
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
|
||||
- ID tokens for agent credentials must not contain client secrets or internal system paths
|
||||
- `alg: none` is explicitly rejected — all ID tokens must be signed
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
|
||||
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
|
||||
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
|
||||
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
|
||||
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
|
||||
- [ ] `/agent-info` returns correct claims for authenticated agent
|
||||
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService
|
||||
335
openspec/changes/phase-3-enterprise/specs/soc2/spec.md
Normal file
335
openspec/changes/phase-3-enterprise/specs/soc2/spec.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# SOC 2 Type II Preparation — Specification
|
||||
|
||||
**Workstream**: 6 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
|
||||
|
||||
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
|
||||
|
||||
---
|
||||
|
||||
## Technical Controls
|
||||
|
||||
### Control C1: Encryption at Rest (Column-Level Encryption)
|
||||
|
||||
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
|
||||
|
||||
**Columns encrypted**:
|
||||
- `credentials.secret_hash` — encrypted with AES-256-CBC
|
||||
- `credentials.vault_path` — encrypted with AES-256-CBC
|
||||
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
|
||||
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
|
||||
|
||||
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
|
||||
|
||||
---
|
||||
|
||||
### Control C2: TLS Enforcement
|
||||
|
||||
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
|
||||
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
|
||||
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
|
||||
|
||||
---
|
||||
|
||||
### Control C3: Automated Secrets Rotation
|
||||
|
||||
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
|
||||
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
|
||||
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
|
||||
3. Renews Vault leases for all active credentials
|
||||
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
|
||||
|
||||
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
|
||||
|
||||
---
|
||||
|
||||
### Control C4: Audit Log Immutability (Merkle Hash Chain)
|
||||
|
||||
Every `audit_logs` row carries two new columns:
|
||||
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
|
||||
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
|
||||
|
||||
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
|
||||
|
||||
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
|
||||
|
||||
---
|
||||
|
||||
### Control C5: Security Event Alerting
|
||||
|
||||
Prometheus alerting rules are written for the following security events:
|
||||
|
||||
| Alert | Condition | Severity |
|
||||
|-------|-----------|---------|
|
||||
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
|
||||
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
|
||||
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
|
||||
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
|
||||
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
|
||||
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /audit/verify
|
||||
|
||||
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
|
||||
|
||||
```yaml
|
||||
GET /audit/verify
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
fromDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: Start of verification range. If omitted, verifies from genesis.
|
||||
toDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: End of verification range. If omitted, verifies to the latest row.
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
description: True if the chain is intact across the entire range
|
||||
rowsVerified:
|
||||
type: integer
|
||||
description: Number of audit rows verified
|
||||
firstEventId:
|
||||
type: string
|
||||
lastEventId:
|
||||
type: string
|
||||
firstTimestamp:
|
||||
type: string
|
||||
format: date-time
|
||||
lastTimestamp:
|
||||
type: string
|
||||
format: date-time
|
||||
verifiedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
brokenAtEventId:
|
||||
type: string
|
||||
nullable: true
|
||||
description: Present only if valid=false — the first eventId where the chain breaks
|
||||
example:
|
||||
valid: true
|
||||
rowsVerified: 15420
|
||||
firstEventId: "evt_genesis_00001"
|
||||
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
|
||||
firstTimestamp: "2026-01-01T00:00:00Z"
|
||||
lastTimestamp: "2026-03-29T12:00:00Z"
|
||||
verifiedAt: "2026-03-29T14:00:00Z"
|
||||
brokenAtEventId: null
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
429 Too Many Requests:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "RATE_LIMITED"
|
||||
message: "Audit verification can be run at most once per 5 minutes"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /compliance/controls
|
||||
|
||||
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
|
||||
|
||||
```yaml
|
||||
GET /compliance/controls
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
generatedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
controls:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
controlId:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
enum: [pass, fail, warning, not_applicable]
|
||||
description:
|
||||
type: string
|
||||
lastChecked:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
generatedAt: "2026-03-29T14:00:00Z"
|
||||
controls:
|
||||
- controlId: "C1"
|
||||
name: "Encryption at Rest"
|
||||
status: "pass"
|
||||
description: "Column-level encryption active for all sensitive columns"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C2"
|
||||
name: "TLS Enforcement"
|
||||
status: "pass"
|
||||
description: "All non-TLS requests redirected to HTTPS in production"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C3"
|
||||
name: "Secrets Rotation"
|
||||
status: "warning"
|
||||
description: "3 credentials expiring within 7 days"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C4"
|
||||
name: "Audit Log Immutability"
|
||||
status: "pass"
|
||||
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C5"
|
||||
name: "Security Event Alerting"
|
||||
status: "pass"
|
||||
description: "All 6 alerting rules active in Prometheus"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### Modified: audit_logs table
|
||||
|
||||
```sql
|
||||
ALTER TABLE audit_logs
|
||||
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
|
||||
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
|
||||
|
||||
-- Back-fill genesis hash for existing rows (one-time migration)
|
||||
-- Migration script computes chain in order of created_at
|
||||
|
||||
-- Prevent updates and deletes (immutability trigger)
|
||||
CREATE OR REPLACE FUNCTION prevent_audit_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER audit_logs_immutability
|
||||
BEFORE UPDATE OR DELETE ON audit_logs
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
|
||||
```
|
||||
|
||||
### Modified: credentials table
|
||||
|
||||
```sql
|
||||
-- Columns remain same type; application now stores encrypted values
|
||||
-- No DDL change — encryption is transparent at application layer
|
||||
-- Add comment for documentation
|
||||
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
|
||||
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
|
||||
```
|
||||
|
||||
### New Table: compliance_check_log
|
||||
|
||||
```sql
|
||||
CREATE TABLE compliance_check_log (
|
||||
check_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
control_id VARCHAR(10) NOT NULL,
|
||||
status VARCHAR(20) NOT NULL,
|
||||
details JSONB NOT NULL DEFAULT '{}',
|
||||
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
|
||||
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
|
||||
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
|
||||
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
|
||||
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
|
||||
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
|
||||
|
||||
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
|
||||
|
||||
---
|
||||
|
||||
## Compliance Documentation
|
||||
|
||||
The following documents are produced as part of this workstream:
|
||||
|
||||
| Document | Path | Description |
|
||||
|----------|------|-------------|
|
||||
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
|
||||
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
|
||||
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
|
||||
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
|
||||
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
|
||||
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
|
||||
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
|
||||
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
|
||||
- `GET /compliance/controls` never returns raw secrets or key material — only control status
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
|
||||
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
|
||||
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
|
||||
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
|
||||
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
|
||||
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
|
||||
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
|
||||
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
|
||||
- [ ] Compliance documentation written and reviewed
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations
|
||||
353
openspec/changes/phase-3-enterprise/specs/w3c-dids/spec.md
Normal file
353
openspec/changes/phase-3-enterprise/specs/w3c-dids/spec.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# W3C Decentralized Identifiers (DIDs) — Specification
|
||||
|
||||
**Workstream**: 2 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
|
||||
|
||||
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /.well-known/did.json
|
||||
|
||||
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
|
||||
|
||||
```yaml
|
||||
GET /.well-known/did.json
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: W3C DID Core 1.0 compliant DID Document
|
||||
required: [id, "@context", verificationMethod, authentication]
|
||||
properties:
|
||||
"@context":
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
example:
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
- "https://w3id.org/security/suites/jws-2020/v1"
|
||||
id:
|
||||
type: string
|
||||
description: DID for this AgentIdP instance
|
||||
example: "did:web:idp.sentryagent.ai"
|
||||
controller:
|
||||
type: string
|
||||
example: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/VerificationMethod'
|
||||
authentication:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: References to verification methods for authentication
|
||||
assertionMethod:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
service:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/DIDService'
|
||||
example:
|
||||
"@context":
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
id: "did:web:idp.sentryagent.ai"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
- id: "did:web:idp.sentryagent.ai#key-1"
|
||||
type: "JsonWebKey2020"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
publicKeyJwk:
|
||||
kty: "EC"
|
||||
crv: "P-256"
|
||||
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
|
||||
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
|
||||
authentication:
|
||||
- "did:web:idp.sentryagent.ai#key-1"
|
||||
service:
|
||||
- id: "did:web:idp.sentryagent.ai#agent-registry"
|
||||
type: "AgentIdentityProvider"
|
||||
serviceEndpoint: "https://idp.sentryagent.ai"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did
|
||||
|
||||
Per-agent DID Document. No authentication required — DID Documents are public.
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did
|
||||
No authentication required
|
||||
|
||||
Path Parameters:
|
||||
agentId:
|
||||
type: string
|
||||
description: Agent ID
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: W3C DID Core 1.0 compliant per-agent DID Document
|
||||
example:
|
||||
"@context":
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
- "https://w3id.org/agntcy/v1"
|
||||
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
|
||||
type: "JsonWebKey2020"
|
||||
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
publicKeyJwk:
|
||||
kty: "EC"
|
||||
crv: "P-256"
|
||||
x: "abc123"
|
||||
y: "def456"
|
||||
authentication:
|
||||
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
|
||||
service:
|
||||
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
|
||||
type: "AgentCard"
|
||||
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
|
||||
agntcy:
|
||||
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agentType: "orchestrator"
|
||||
capabilities:
|
||||
- "task-planning"
|
||||
- "tool-use"
|
||||
deploymentEnv: "production"
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "AGENT_NOT_FOUND"
|
||||
message: "Agent not found"
|
||||
410 Gone:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "AGENT_DECOMMISSIONED"
|
||||
message: "Agent has been decommissioned — DID Document is no longer active"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did/resolve
|
||||
|
||||
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did/resolve
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Path Parameters:
|
||||
agentId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
|
||||
schema:
|
||||
type: object
|
||||
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
|
||||
properties:
|
||||
didDocument:
|
||||
type: object
|
||||
description: The resolved DID Document
|
||||
didDocumentMetadata:
|
||||
type: object
|
||||
properties:
|
||||
created:
|
||||
type: string
|
||||
format: date-time
|
||||
updated:
|
||||
type: string
|
||||
format: date-time
|
||||
deactivated:
|
||||
type: boolean
|
||||
didResolutionMetadata:
|
||||
type: object
|
||||
properties:
|
||||
contentType:
|
||||
type: string
|
||||
example: "application/did+ld+json"
|
||||
retrieved:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
didDocument:
|
||||
"@context": ["https://www.w3.org/ns/did/v1"]
|
||||
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
didDocumentMetadata:
|
||||
created: "2026-03-29T12:00:00Z"
|
||||
updated: "2026-03-29T12:00:00Z"
|
||||
deactivated: false
|
||||
didResolutionMetadata:
|
||||
contentType: "application/did+ld+json"
|
||||
retrieved: "2026-03-29T14:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did/card
|
||||
|
||||
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did/card
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: AGNTCY-format agent card
|
||||
properties:
|
||||
did:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
agentType:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
owner:
|
||||
type: string
|
||||
version:
|
||||
type: string
|
||||
deploymentEnv:
|
||||
type: string
|
||||
identityProvider:
|
||||
type: string
|
||||
description: DID of the issuing AgentIdP instance
|
||||
issuedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
name: "acme-orchestrator"
|
||||
agentType: "orchestrator"
|
||||
capabilities: ["task-planning", "tool-use"]
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
deploymentEnv: "production"
|
||||
identityProvider: "did:web:idp.sentryagent.ai"
|
||||
issuedAt: "2026-03-29T12:00:00Z"
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: agent_did_keys
|
||||
|
||||
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_did_keys (
|
||||
key_id VARCHAR(40) PRIMARY KEY,
|
||||
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
public_key_jwk JSONB NOT NULL,
|
||||
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
|
||||
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
|
||||
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
rotated_at TIMESTAMPTZ,
|
||||
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
|
||||
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
|
||||
```
|
||||
|
||||
### New Column: agents.did
|
||||
|
||||
```sql
|
||||
ALTER TABLE agents
|
||||
ADD COLUMN did VARCHAR(255),
|
||||
ADD COLUMN did_created_at TIMESTAMPTZ;
|
||||
|
||||
-- Populated automatically on agent creation
|
||||
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
|
||||
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
|
||||
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
|
||||
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
|
||||
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
|
||||
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
|
||||
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
|
||||
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
|
||||
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
|
||||
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
|
||||
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
|
||||
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
|
||||
- [ ] DID resolution endpoint returns W3C DID Resolution format
|
||||
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
|
||||
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
|
||||
- [ ] Private keys never appear in any API response or log
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService
|
||||
476
openspec/changes/phase-3-enterprise/specs/webhooks/spec.md
Normal file
476
openspec/changes/phase-3-enterprise/specs/webhooks/spec.md
Normal file
@@ -0,0 +1,476 @@
|
||||
# Webhooks and Event Streaming — Specification
|
||||
|
||||
**Workstream**: 5 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
|
||||
|
||||
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
|
||||
|
||||
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /webhooks
|
||||
|
||||
Create a new webhook subscription. Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
POST /webhooks
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [url, events, secret]
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
description: HTTPS endpoint to deliver events to
|
||||
example: "https://app.example.com/hooks/agentidp"
|
||||
events:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
enum:
|
||||
- agent.created
|
||||
- agent.updated
|
||||
- agent.suspended
|
||||
- agent.reactivated
|
||||
- agent.decommissioned
|
||||
- credential.generated
|
||||
- credential.rotated
|
||||
- credential.revoked
|
||||
- token.issued
|
||||
- token.revoked
|
||||
- "*"
|
||||
minItems: 1
|
||||
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
|
||||
example: ["agent.created", "credential.rotated"]
|
||||
secret:
|
||||
type: string
|
||||
minLength: 16
|
||||
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
|
||||
example: "whsec_super_secret_value_here"
|
||||
description:
|
||||
type: string
|
||||
maxLength: 255
|
||||
description: Optional human-readable description for this subscription
|
||||
active:
|
||||
type: boolean
|
||||
default: true
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
example:
|
||||
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
url: "https://app.example.com/hooks/agentidp"
|
||||
events: ["agent.created", "credential.rotated"]
|
||||
description: "Production event sink"
|
||||
active: true
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
examples:
|
||||
invalid_url:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "url must be a valid HTTPS URI"
|
||||
invalid_event:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "Unknown event type: agent.unknown"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks
|
||||
|
||||
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Query Parameters:
|
||||
active:
|
||||
type: boolean
|
||||
description: Filter by active/inactive subscriptions
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 20
|
||||
maximum: 100
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks/:id
|
||||
|
||||
Get a single webhook subscription. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Path Parameters:
|
||||
subscriptionId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "WEBHOOK_NOT_FOUND"
|
||||
message: "Webhook subscription not found"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PATCH /webhooks/:id
|
||||
|
||||
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
PATCH /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
events:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description:
|
||||
type: string
|
||||
maxLength: 255
|
||||
active:
|
||||
type: boolean
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /webhooks/:id
|
||||
|
||||
Delete a webhook subscription. Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
DELETE /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks/:id/deliveries
|
||||
|
||||
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks/{subscriptionId}/deliveries
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [pending, success, failed, dead_letter]
|
||||
eventType:
|
||||
type: string
|
||||
description: Filter by event type
|
||||
fromDate:
|
||||
type: string
|
||||
format: date-time
|
||||
toDate:
|
||||
type: string
|
||||
format: date-time
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 50
|
||||
maximum: 200
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/WebhookDelivery'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
|
||||
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
|
||||
eventType: "agent.created"
|
||||
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
|
||||
status: "success"
|
||||
httpStatusCode: 200
|
||||
attemptCount: 1
|
||||
nextRetryAt: null
|
||||
deliveredAt: "2026-03-29T12:00:05Z"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 50
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Webhook Payload Format
|
||||
|
||||
Every webhook delivery uses this envelope format:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
|
||||
"type": "agent.created",
|
||||
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
|
||||
"timestamp": "2026-03-29T12:00:00Z",
|
||||
"data": {
|
||||
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"agentType": "orchestrator",
|
||||
"status": "active",
|
||||
"owner": "acme-ai",
|
||||
"version": "1.0.0",
|
||||
"deploymentEnv": "production"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### HMAC-SHA256 Signature
|
||||
|
||||
Every delivery includes the following HTTP headers:
|
||||
|
||||
```
|
||||
X-AgentIdP-Event: agent.created
|
||||
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
|
||||
X-AgentIdP-Timestamp: 1743249600
|
||||
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
|
||||
```
|
||||
|
||||
Signature computation:
|
||||
```
|
||||
signed_content = timestamp + "." + JSON.stringify(payload)
|
||||
signature = HMAC-SHA256(secret, signed_content)
|
||||
header_value = "sha256=" + hex(signature)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: webhook_subscriptions
|
||||
|
||||
```sql
|
||||
CREATE TABLE webhook_subscriptions (
|
||||
subscription_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
url VARCHAR(2048) NOT NULL,
|
||||
events JSONB NOT NULL DEFAULT '[]',
|
||||
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
|
||||
vault_secret_path VARCHAR(255) NOT NULL,
|
||||
description VARCHAR(255),
|
||||
active BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
failure_count INTEGER NOT NULL DEFAULT 0,
|
||||
last_delivery_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
|
||||
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
|
||||
```
|
||||
|
||||
### New Table: webhook_deliveries
|
||||
|
||||
```sql
|
||||
CREATE TABLE webhook_deliveries (
|
||||
delivery_id VARCHAR(40) PRIMARY KEY,
|
||||
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
event_id VARCHAR(40) NOT NULL,
|
||||
event_type VARCHAR(100) NOT NULL,
|
||||
payload JSONB NOT NULL,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'pending',
|
||||
http_status_code SMALLINT,
|
||||
response_body TEXT,
|
||||
attempt_count SMALLINT NOT NULL DEFAULT 0,
|
||||
next_retry_at TIMESTAMPTZ,
|
||||
delivered_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
|
||||
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
|
||||
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
|
||||
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Retry Schedule
|
||||
|
||||
```
|
||||
Attempt 1: immediate
|
||||
Attempt 2: 1 minute after failure
|
||||
Attempt 3: 5 minutes after failure
|
||||
Attempt 4: 15 minutes after failure
|
||||
Attempt 5: 1 hour after failure
|
||||
Attempt 6: 4 hours after failure
|
||||
Attempt 7: 12 hours after failure
|
||||
Attempt 8: 24 hours after failure
|
||||
Attempt 9: 48 hours after failure
|
||||
Attempt 10: 72 hours after failure
|
||||
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
|
||||
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
|
||||
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
|
||||
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
|
||||
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
|
||||
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
|
||||
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
|
||||
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
|
||||
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
|
||||
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
|
||||
- HMAC signature allows the receiving server to verify the delivery is authentic
|
||||
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
|
||||
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
|
||||
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
|
||||
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
|
||||
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
|
||||
- [ ] After max retries, status is `dead_letter` and metric is incremented
|
||||
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
|
||||
- [ ] Delivery to private IP range is rejected (SSRF protection)
|
||||
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService
|
||||
142
openspec/changes/phase-3-enterprise/tasks.md
Normal file
142
openspec/changes/phase-3-enterprise/tasks.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Phase 3: Enterprise — Tasks
|
||||
|
||||
**Status**: Proposed — awaiting CEO approval
|
||||
|
||||
## CEO Approval Gates (required before implementation)
|
||||
|
||||
- [ ] A0.1 Approve dependency: `did-resolver` + `web-did-resolver` (W3C DID support)
|
||||
- [ ] A0.2 Approve dependency: `oidc-provider` (certified OIDC server library)
|
||||
- [ ] A0.3 Approve dependency: `bull` (Redis-backed webhook delivery queue)
|
||||
- [ ] A0.4 Approve dependency: `kafkajs` (optional Kafka adapter for webhooks)
|
||||
- [ ] A0.5 Approve dependency: `node-forge` (column-level encryption for SOC 2)
|
||||
|
||||
---
|
||||
|
||||
## Workstream 1: Multi-Tenancy
|
||||
|
||||
- [ ] 1.1 Write `src/db/migrations/006_create_organizations_table.sql` — organizations table with slug, plan_tier, max_agents, max_tokens_per_month, status
|
||||
- [ ] 1.2 Write `src/db/migrations/007_create_organization_members_table.sql` — organization_members with agent_id FK and role
|
||||
- [ ] 1.3 Write `src/db/migrations/008_add_organization_id_to_agents.sql` — add organization_id column + index + RLS policy on agents
|
||||
- [ ] 1.4 Write `src/db/migrations/009_add_organization_id_to_credentials.sql` — add organization_id column + index + RLS policy on credentials
|
||||
- [ ] 1.5 Write `src/db/migrations/010_add_organization_id_to_audit_logs.sql` — add organization_id column + index + RLS policy on audit_logs
|
||||
- [ ] 1.6 Write `src/db/migrations/011_seed_system_organization.sql` — insert default system org and backfill existing rows
|
||||
- [ ] 1.7 Write `src/types/organization.ts` — IOrganization, ICreateOrgRequest, IUpdateOrgRequest, IOrgMember, IPaginatedOrgsResponse, OrgStatus, PlanTier interfaces
|
||||
- [ ] 1.8 Write `src/services/OrgService.ts` — createOrg, listOrgs, getOrg, updateOrg, deleteOrg, addMember; all methods accept organizationId context
|
||||
- [ ] 1.9 Write `src/controllers/OrgController.ts` — request parsing and validation for all 6 org endpoints
|
||||
- [ ] 1.10 Write `src/routes/organizations.ts` — mount all 6 org endpoints with admin:orgs scope guard
|
||||
- [ ] 1.11 Write `src/middleware/orgContext.ts` — OrgContextMiddleware: extracts organization_id from JWT and calls SET LOCAL app.organization_id before each DB query
|
||||
- [ ] 1.12 Update `src/middleware/auth.ts` — extend ITokenPayload with organization_id claim; validate org claim on every request
|
||||
- [ ] 1.13 Update `src/services/AgentService.ts` — add organizationId parameter to all methods; enforce org scoping on all queries
|
||||
- [ ] 1.14 Update `src/services/CredentialService.ts` — add organizationId parameter to all methods
|
||||
- [ ] 1.15 Update `src/services/AuditService.ts` — add organizationId parameter to all methods; include organization_id on every audit event insert
|
||||
- [ ] 1.16 Update `src/services/OAuth2Service.ts` — include organization_id claim in issued JWT payload
|
||||
- [ ] 1.17 Update `src/types/index.ts` — extend ITokenPayload with organization_id field
|
||||
- [ ] 1.18 Update OPA policy `policies/authz.rego` — add organization_id check: agents can only access resources in their own organization
|
||||
- [ ] 1.19 Write unit tests for OrgService (CRUD, member management, org isolation)
|
||||
- [ ] 1.20 Write integration tests — verify cross-org data isolation: agent in org A cannot be read with a token from org B
|
||||
- [ ] 1.21 QA sign-off: RLS verified via direct DB query, org isolation test passes, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 2: W3C DIDs
|
||||
|
||||
- [ ] 2.1 Write `src/db/migrations/012_create_agent_did_keys_table.sql` — agent_did_keys table with public_key_jwk JSONB and vault_key_path
|
||||
- [ ] 2.2 Write `src/db/migrations/013_add_did_columns_to_agents.sql` — add did and did_created_at columns to agents
|
||||
- [ ] 2.3 Write `src/types/did.ts` — IDIDDocument, IVerificationMethod, IDIDService, IDIDResolutionResult, IAgentCard interfaces
|
||||
- [ ] 2.4 Write `src/services/DIDService.ts` — generateDID (creates key pair, stores private in Vault, public in agent_did_keys), buildInstanceDIDDocument, buildAgentDIDDocument, buildAgentCard, buildResolutionResult
|
||||
- [ ] 2.5 Update `src/services/AgentService.ts` — call DIDService.generateDID on every new agent registration; populate did column
|
||||
- [ ] 2.6 Write `src/controllers/DIDController.ts` — handlers for root DID Document, per-agent DID Document, resolution endpoint, agent card
|
||||
- [ ] 2.7 Write `src/routes/did.ts` — mount `/.well-known/did.json`, `/agents/:id/did`, `/agents/:id/did/resolve`, `/agents/:id/did/card`
|
||||
- [ ] 2.8 Implement Redis caching in DIDService — cache DID Documents with TTL configurable via DID_DOCUMENT_CACHE_TTL_SECONDS
|
||||
- [ ] 2.9 Handle decommissioned agents — DID Document returns `deactivated: true` in metadata; HTTP 410 Gone for the DID endpoint
|
||||
- [ ] 2.10 Write unit tests for DIDService — DID construction, key pair generation, AGNTCY card format
|
||||
- [ ] 2.11 Write integration tests — GET /.well-known/did.json and GET /agents/:id/did return valid DID Documents; validated by did-resolver
|
||||
- [ ] 2.12 QA sign-off: DID Core 1.0 compliance verified, private key never in response, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 3: OpenID Connect (OIDC)
|
||||
|
||||
- [ ] 3.1 Write `src/db/migrations/014_create_oidc_keys_table.sql` — oidc_keys table with kid, public_key_jwk, vault_key_path, is_current
|
||||
- [ ] 3.2 Write `src/services/OIDCKeyService.ts` — generateSigningKeyPair (RSA-2048 or EC P-256), storeKeyInVault, getPublicJWKS, getCurrentKeyId, rotateKey
|
||||
- [ ] 3.3 Write `src/services/IDTokenService.ts` — buildIDTokenClaims (agent claims), signIDToken using current Vault-stored key, verifyIDToken
|
||||
- [ ] 3.4 Write `src/types/oidc.ts` — IIDTokenClaims, IJWKSResponse, IOIDCDiscoveryDocument, IAgentInfoResponse interfaces
|
||||
- [ ] 3.5 Write `src/controllers/OIDCController.ts` — handlers for discovery, JWKS, agent-info
|
||||
- [ ] 3.6 Write `src/routes/oidc.ts` — mount `/.well-known/openid-configuration`, `/.well-known/jwks.json`, `/agent-info`
|
||||
- [ ] 3.7 Update `src/services/OAuth2Service.ts` — when `openid` scope is present in request, generate and append `id_token` to token response
|
||||
- [ ] 3.8 Implement JWKS caching — cache JWKS in Redis with TTL; invalidate on key rotation
|
||||
- [ ] 3.9 Implement key rotation logic — on rotation, old key remains in JWKS until all tokens signed with it have expired
|
||||
- [ ] 3.10 Write unit tests for OIDCKeyService and IDTokenService — key generation, token signing, JWKS format
|
||||
- [ ] 3.11 Write integration tests — POST /oauth2/token with `openid` scope returns id_token; validate id_token against JWKS; GET /agent-info returns correct claims
|
||||
- [ ] 3.12 QA sign-off: OIDC discovery document passes conformance checks, id_token verifiable, `alg: none` rejected, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 4: AGNTCY Federation
|
||||
|
||||
- [ ] 4.1 Write `src/db/migrations/015_create_federation_partners_table.sql` — federation_partners table with issuer, jwks_uri, allowed_organizations JSONB, status, expires_at
|
||||
- [ ] 4.2 Write `src/types/federation.ts` — IFederationPartner, ICreatePartnerRequest, IVerifyFederatedTokenRequest, IFederationVerifyResult interfaces
|
||||
- [ ] 4.3 Write `src/services/FederationService.ts` — registerPartner (validates by fetching JWKS), listPartners, deletePartner, verifyFederatedToken (fetch-or-cache JWKS, verify signature, validate claims)
|
||||
- [ ] 4.4 Implement JWKS caching in FederationService — store partner JWKS in Redis with TTL configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS
|
||||
- [ ] 4.5 Write `src/controllers/FederationController.ts` — handlers for POST /federation/trust, GET /federation/partners, DELETE /federation/partners/:id, POST /federation/verify
|
||||
- [ ] 4.6 Write `src/routes/federation.ts` — mount all 4 federation endpoints
|
||||
- [ ] 4.7 Implement partner expiry check — partners past `expires_at` are treated as status `expired`; their tokens rejected
|
||||
- [ ] 4.8 Implement `allowedOrganizations` filter — reject tokens whose `organization_id` is not in the allow list (if list is non-empty)
|
||||
- [ ] 4.9 Write unit tests for FederationService — trust registration, token verification (valid/expired/untrusted/tampered), JWKS cache behavior
|
||||
- [ ] 4.10 Write integration tests — end-to-end: register partner, verify a valid token from that partner, verify rejection for unknown issuer
|
||||
- [ ] 4.11 QA sign-off: tampered token rejected, expired partner rejected, JWKS cache verified, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 5: Webhooks and Event Streaming
|
||||
|
||||
- [ ] 5.1 Write `src/db/migrations/016_create_webhook_subscriptions_table.sql` — webhook_subscriptions with url, events JSONB, secret_hash, vault_secret_path, active, failure_count
|
||||
- [ ] 5.2 Write `src/db/migrations/017_create_webhook_deliveries_table.sql` — webhook_deliveries with status, http_status_code, attempt_count, next_retry_at
|
||||
- [ ] 5.3 Write `src/types/webhook.ts` — IWebhookSubscription, ICreateWebhookRequest, IWebhookDelivery, IWebhookPayload, WebhookEventType interfaces
|
||||
- [ ] 5.4 Write `src/services/WebhookService.ts` — createSubscription (store secret in Vault), listSubscriptions, getSubscription, updateSubscription, deleteSubscription, listDeliveries
|
||||
- [ ] 5.5 Write `src/workers/WebhookDeliveryWorker.ts` — bull queue worker: fetch subscription, compute HMAC-SHA256 signature, POST to URL with headers, update delivery status, schedule retry on failure
|
||||
- [ ] 5.6 Write `src/services/EventPublisher.ts` — buildEventPayload, publishEvent (enqueues to bull queue; also produces to Kafka if KAFKA_BROKERS is set)
|
||||
- [ ] 5.7 Update `src/services/AgentService.ts` — call EventPublisher.publishEvent for: agent.created, agent.updated, agent.suspended, agent.reactivated, agent.decommissioned
|
||||
- [ ] 5.8 Update `src/services/CredentialService.ts` — call EventPublisher.publishEvent for: credential.generated, credential.rotated, credential.revoked
|
||||
- [ ] 5.9 Update `src/services/OAuth2Service.ts` — call EventPublisher.publishEvent for: token.issued, token.revoked
|
||||
- [ ] 5.10 Write `src/controllers/WebhookController.ts` — handlers for all 6 webhook endpoints
|
||||
- [ ] 5.11 Write `src/routes/webhooks.ts` — mount all 6 webhook endpoints with correct scope guards
|
||||
- [ ] 5.12 Implement SSRF protection in WebhookDeliveryWorker — reject delivery to RFC 1918 addresses, loopback, and link-local ranges
|
||||
- [ ] 5.13 Implement dead-letter handling — after max retries, set status to dead_letter and increment `agentidp_webhook_dead_letters_total` Prometheus metric
|
||||
- [ ] 5.14 Write `src/adapters/KafkaAdapter.ts` — optional Kafka producer; activated only when KAFKA_BROKERS env var is set
|
||||
- [ ] 5.15 Write unit tests for WebhookService, WebhookDeliveryWorker, EventPublisher — HMAC computation, retry schedule, dead-letter logic
|
||||
- [ ] 5.16 Write integration tests — create subscription, trigger an event, verify delivery; verify SSRF rejection; verify retry on 5xx response
|
||||
- [ ] 5.17 QA sign-off: HMAC verifiable, SSRF protection active, retry schedule correct, dead-letter metric fires, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 6: SOC 2 Type II Preparation
|
||||
|
||||
- [ ] 6.1 Enable `pgcrypto` PostgreSQL extension in `src/db/migrations/018_enable_pgcrypto.sql`
|
||||
- [ ] 6.2 Write `src/services/EncryptionService.ts` — AES-256-CBC encrypt/decrypt using key from Vault; methods: encryptColumn, decryptColumn, isEncrypted
|
||||
- [ ] 6.3 Write `src/db/migrations/019_encrypt_sensitive_columns.sql` — re-encrypt existing credentials.secret_hash and credentials.vault_path values using EncryptionService (migration script)
|
||||
- [ ] 6.4 Update `src/services/CredentialService.ts` — all reads/writes of secret_hash and vault_path go through EncryptionService
|
||||
- [ ] 6.5 Update `src/services/WebhookService.ts` — vault_secret_path column encrypted via EncryptionService
|
||||
- [ ] 6.6 Update `src/services/DIDService.ts` — vault_key_path in agent_did_keys encrypted via EncryptionService
|
||||
- [ ] 6.7 Write `src/middleware/TLSEnforcementMiddleware.ts` — redirect HTTP to HTTPS in production using X-Forwarded-Proto header; passthrough in development
|
||||
- [ ] 6.8 Register TLSEnforcementMiddleware in `src/app.ts` — first in middleware stack
|
||||
- [ ] 6.9 Write `src/db/migrations/020_add_audit_chain_columns.sql` — add hash and previous_hash columns to audit_logs; add immutability trigger; backfill chain for existing rows
|
||||
- [ ] 6.10 Update `src/services/AuditService.ts` — compute Merkle hash on every insert: hash = SHA-256(eventId + timestamp + action + outcome + agentId + organizationId + previousHash)
|
||||
- [ ] 6.11 Write `src/services/AuditVerificationService.ts` — verifyChain(fromDate?, toDate?): reads rows in order, recomputes hashes, returns IChainVerificationResult
|
||||
- [ ] 6.12 Write `src/jobs/SecretsRotationJob.ts` — cron job: identify expiring credentials, emit `agentidp_credentials_expiring_soon_total` metric, renew Vault leases
|
||||
- [ ] 6.13 Write `src/jobs/AuditChainVerificationJob.ts` — cron job: runs verifyChain on a schedule, sets `agentidp_audit_chain_integrity` Prometheus gauge to 1 (pass) or 0 (fail)
|
||||
- [ ] 6.14 Write `src/controllers/ComplianceController.ts` — handlers for GET /audit/verify and GET /compliance/controls
|
||||
- [ ] 6.15 Write `src/routes/compliance.ts` — mount /audit/verify (rate-limited) and /compliance/controls
|
||||
- [ ] 6.16 Write `monitoring/prometheus/alerts.yml` — all 6 alerting rules: AuthFailureSpike, RateLimitExhaustion, AnomalousTokenIssuance, WebhookDeadLetterAccumulating, AuditChainIntegrityFailed, CredentialExpiryApproaching
|
||||
- [ ] 6.17 Update `monitoring/prometheus/prometheus.yml` — add alerting rules file reference
|
||||
- [ ] 6.18 Write compliance documentation package: `docs/compliance/soc2-controls-matrix.md` (Trust Services Criteria → controls map), `docs/compliance/encryption-runbook.md` (key rotation procedure), `docs/compliance/audit-log-runbook.md` (chain verification guide)
|
||||
- [ ] 6.19 Write operational runbooks: `docs/compliance/incident-response.md` (security event procedures), `docs/compliance/secrets-rotation.md` (credential and signing key rotation guide)
|
||||
- [ ] 6.20 Write unit tests for EncryptionService (encrypt/decrypt round-trip, Vault key fetch) and AuditVerificationService (intact chain, tampered chain with correct brokenAtEventId)
|
||||
- [ ] 6.21 Write integration tests — TLS enforcement verified, encrypted columns not plaintext-readable in direct DB query, chain verification returns correct results
|
||||
- [ ] 6.22 QA sign-off: all 5 controls pass GET /compliance/controls, all 6 Prometheus alerts valid, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 Complete Criteria
|
||||
|
||||
All 6 workstreams done. All tasks checked. All QA gates passed. CEO reviewed. SOC 2 audit window begins.
|
||||
Reference in New Issue
Block a user