chore(openspec): archive all completed changes, sync 14 new specs to library

Archived 4 completed OpenSpec changes (2026-04-02):
- phase-3-enterprise (100/100 tasks) — 6 Phase 3 capabilities synced
- devops-documentation (48/48 tasks) — 3 new + 1 merged capability
- bedroom-developer-docs (33/33 tasks) — 4 new capabilities synced
- engineering-docs (superseded by 2026-03-29 archive) — no tasks

Main spec library grows from 21 → 35 capabilities (+14 new):
federation, multi-tenancy, oidc, soc2, w3c-dids, webhooks,
database, operations, system-overview, api-reference, core-concepts,
developer-guides, quick-start + deployment (merged additive requirements)

Active changes: 0 — project board is clear for Phase 4 planning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
SentryAgent.ai Developer
2026-04-02 03:50:47 +00:00
parent ceec22f714
commit f1fbe0e29a
53 changed files with 3019 additions and 0 deletions

View File

@@ -0,0 +1,50 @@
## ADDED Requirements
### Requirement: API reference exists at docs/developers/api-reference.md
The system SHALL provide a human-readable API reference at `docs/developers/api-reference.md` covering all 14 endpoints across the four services: Agent Registry, OAuth 2.0 Token, Credential Management, and Audit Log.
#### Scenario: Developer finds any endpoint within 10 seconds
- **WHEN** the developer opens the API reference
- **THEN** they SHALL find a table of contents at the top linking to each of the four service sections
### Requirement: Every endpoint is documented with method, path, description, and auth requirements
For each of the 14 endpoints, the reference SHALL document: HTTP method, path, one-sentence description, and whether Bearer token auth is required.
#### Scenario: Developer knows which endpoints require authentication
- **WHEN** the developer scans the reference
- **THEN** they SHALL clearly see which endpoints require a Bearer token (all except POST /token) and which do not
### Requirement: Every endpoint includes a complete curl example
For each endpoint, the reference SHALL include at least one complete, runnable curl example with real placeholder values.
#### Scenario: Developer copies a curl example and runs it
- **WHEN** the developer copies a curl example from the reference
- **THEN** the command SHALL be complete — no ellipses, no `...`, no missing flags — requiring only substitution of their own agentId, token, and base URL
### Requirement: Every endpoint documents all request parameters and body fields
For each endpoint that accepts a request body or query parameters, the reference SHALL list every field with: name, type, required/optional, description, and validation constraints.
#### Scenario: Developer knows what fields are required for POST /agents
- **WHEN** the developer reads the POST /agents section
- **THEN** they SHALL see a table listing every field, its type, whether it is required, and any constraints (e.g. email format, max length)
### Requirement: Every endpoint documents all response codes and response body schemas
For each endpoint, the reference SHALL document every possible HTTP response code (2xx and 4xx/5xx) with a description and example response body.
#### Scenario: Developer understands a 429 response
- **WHEN** the developer reads the rate limit error documentation
- **THEN** they SHALL understand what triggered it, what the X-RateLimit-* headers mean, and when they can retry
### Requirement: API reference includes a base URL and versioning section
The reference SHALL include a section at the top explaining the base URL convention, port configuration, and that all endpoints are unversioned in Phase 1.
#### Scenario: Developer knows where to send requests
- **WHEN** the developer reads the base URL section
- **THEN** they SHALL see the default base URL (http://localhost:3000), how to change the port via environment variable, and a note that versioning will be introduced in Phase 2
### Requirement: API reference includes an errors section
The reference SHALL include a dedicated errors section listing all standard error response shapes, all custom error codes, and their HTTP status code mappings.
#### Scenario: Developer handles an AgentNotFoundError
- **WHEN** the developer reads the errors section
- **THEN** they SHALL see the exact JSON shape of the error response, the error code string, and the HTTP status (404)

View File

@@ -0,0 +1,43 @@
## ADDED Requirements
### Requirement: Core concepts guide exists at docs/developers/concepts.md
The system SHALL provide a concepts guide at `docs/developers/concepts.md` that explains the AgentIdP model in plain English with no assumed prior knowledge of AGNTCY or OAuth 2.0.
#### Scenario: Developer understands what AgentIdP is
- **WHEN** a developer reads the concepts guide
- **THEN** they SHALL be able to explain in one sentence what SentryAgent.ai AgentIdP does and why they need it
### Requirement: Concepts guide explains what an AI agent identity is
The guide SHALL explain in plain English what it means to give an AI agent an identity — how it differs from a human user account and why agents need their own identity model.
#### Scenario: Agent identity vs human identity distinction is clear
- **WHEN** the developer reads the agent identity section
- **THEN** they SHALL understand that agents are non-human, machine-operated identities that need persistent, auditable credentials — not session-based logins
### Requirement: Concepts guide explains the AGNTCY alignment
The guide SHALL explain what AGNTCY is (Linux Foundation standard), why SentryAgent.ai aligns to it, and what benefit that gives the developer — without requiring the developer to read the AGNTCY specification.
#### Scenario: Developer understands AGNTCY without external reading
- **WHEN** the developer reads the AGNTCY section
- **THEN** they SHALL understand that AGNTCY-aligned agent IDs are interoperable across the AI agent ecosystem, and that SentryAgent.ai implements this for free
### Requirement: Concepts guide explains the agent lifecycle
The guide SHALL explain the four lifecycle states of an agent (active, suspended, decommissioned) and what each state means for credential and token behaviour.
#### Scenario: Developer understands what happens when an agent is decommissioned
- **WHEN** the developer reads the lifecycle section
- **THEN** they SHALL understand that decommissioning is irreversible, all credentials are revoked, and no new tokens can be issued
### Requirement: Concepts guide explains OAuth 2.0 Client Credentials in plain English
The guide SHALL explain the Client Credentials grant in plain English — no RFC references, no formal OAuth jargon — focused on how agents use it to authenticate.
#### Scenario: Developer understands client_id and client_secret without prior OAuth knowledge
- **WHEN** the developer reads the OAuth section
- **THEN** they SHALL understand that client_id identifies the agent and client_secret proves it — analogous to a username and password for machines
### Requirement: Concepts guide explains the free-tier limits
The guide SHALL document all free-tier limits (100 agents, 10,000 tokens/month, 100 req/min, 90-day audit retention) in a clear table.
#### Scenario: Developer knows the limits before hitting them
- **WHEN** the developer reads the free-tier section
- **THEN** they SHALL see a table with all four limits and a note on what happens when each is exceeded

View File

@@ -0,0 +1,4 @@
## ADDED Requirements
### Requirement: Database doc exists at docs/devops/database.md
The system SHALL provide `docs/devops/database.md` documenting the 4-table schema (agents, credentials, audit_events, token_revocations), the migration runner, and exact commands to apply and verify migrations.

View File

@@ -42,3 +42,8 @@ terraform/
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control
## ADDED Requirements
### Requirement: Local development guide exists at docs/devops/local-development.md
The system SHALL provide `docs/devops/local-development.md` documenting the complete local setup using docker-compose for infrastructure and npm for the application server, including all service ports, health check verification, and the Dockerfile gap note.

View File

@@ -0,0 +1,56 @@
## ADDED Requirements
### Requirement: Developer guides index exists at docs/developers/guides/README.md
The system SHALL provide a guides index at `docs/developers/guides/README.md` listing all available guides with one-line descriptions and links.
#### Scenario: Developer finds the right guide quickly
- **WHEN** the developer opens the guides folder
- **THEN** they SHALL see a list of all guides with descriptions so they can choose the one relevant to their task
### Requirement: Agent registration guide exists at docs/developers/guides/register-an-agent.md
The system SHALL provide a step-by-step guide for registering an agent, including all required and optional fields, validation rules, and how to handle the response.
#### Scenario: Developer registers their first agent
- **WHEN** the developer follows the registration guide
- **THEN** they SHALL successfully create an agent and understand what `agentId`, `clientId`, and `status` mean in the response
#### Scenario: Developer understands registration validation errors
- **WHEN** the guide covers validation
- **THEN** it SHALL show examples of common validation errors (missing required fields, invalid email format) and how to fix them
### Requirement: Credential management guide exists at docs/developers/guides/manage-credentials.md
The system SHALL provide a guide covering all four credential operations: generate, list, rotate, and revoke — with curl examples and explanation of when to use each.
#### Scenario: Developer rotates a compromised credential
- **WHEN** the developer follows the rotation section
- **THEN** they SHALL understand that rotation replaces the secret while keeping the same `credentialId`, and the old secret is immediately invalid
#### Scenario: Developer understands credential revocation vs agent decommission
- **WHEN** the developer reads the guide
- **THEN** they SHALL understand the difference: revoking a credential leaves the agent active with other credentials; decommissioning the agent revokes everything permanently
### Requirement: Token guide exists at docs/developers/guides/issue-and-revoke-tokens.md
The system SHALL provide a guide covering token issuance, introspection, and revocation — explaining the JWT structure, expiry, and how to use the Bearer token in API requests.
#### Scenario: Developer uses a token to authenticate a request
- **WHEN** the developer follows the token guide
- **THEN** they SHALL see an example of using the issued token as a Bearer token in an Authorization header on a subsequent API call
#### Scenario: Developer introspects a token to check validity
- **WHEN** the developer reads the introspection section
- **THEN** they SHALL understand what `active: true/false` means and what fields are returned
#### Scenario: Developer revokes a token
- **WHEN** the developer follows the revocation section
- **THEN** they SHALL understand that revoked tokens are immediately invalid even if not yet expired
### Requirement: Audit log guide exists at docs/developers/guides/query-audit-logs.md
The system SHALL provide a guide for querying the audit log — covering available filters (agentId, action, outcome, date range), pagination, and how to interpret audit events.
#### Scenario: Developer queries audit events for a specific agent
- **WHEN** the developer follows the audit guide
- **THEN** they SHALL see a curl example filtering by `agentId` and understand the structure of each audit event
#### Scenario: Developer understands audit log retention
- **WHEN** the developer reads the guide
- **THEN** they SHALL understand that free-tier audit logs are retained for 90 days and what happens after that window

View File

@@ -0,0 +1,370 @@
# AGNTCY Federation — Specification
**Workstream**: 4 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
---
## API Endpoints
### POST /federation/trust
Register a new federation trust partner. Requires `admin:orgs` scope.
```yaml
POST /federation/trust
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, issuer, jwksUri]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Human-readable name for this federation partner
example: "Contoso AgentIdP"
issuer:
type: string
format: uri
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
example: "https://agentidp.contoso.com"
jwksUri:
type: string
format: uri
description: URL of the partner's JWKS endpoint
example: "https://agentidp.contoso.com/.well-known/jwks.json"
allowedOrganizations:
type: array
items:
type: string
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
example: ["org_contoso_engineering"]
expiresAt:
type: string
format: date-time
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
Responses:
201 Created:
schema:
$ref: '#/components/schemas/FederationPartner'
example:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
allowedOrganizations: []
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
duplicate_issuer:
code: "DUPLICATE_ISSUER"
message: "A trust relationship with this issuer already exists"
unreachable_jwks:
code: "JWKS_UNREACHABLE"
message: "Could not fetch JWKS from the provided jwksUri"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /federation/partners
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
```yaml
GET /federation/partners
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, expired]
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/FederationPartner'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /federation/partners/:partnerId
Remove a federation trust relationship. Requires `admin:orgs` scope.
```yaml
DELETE /federation/partners/{partnerId}
Authorization: Bearer <token with admin:orgs scope>
Path Parameters:
partnerId:
type: string
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /federation/verify
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
```yaml
POST /federation/verify
Authorization: Bearer <local access_token with agents:read scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [token]
properties:
token:
type: string
description: The JWT token issued by the remote AgentIdP instance to verify
expectedIssuer:
type: string
format: uri
description: Optional — if provided, verification fails if token issuer does not match
expectedOrganizationId:
type: string
description: Optional — if provided, verification fails if token organization_id does not match
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
claims:
type: object
description: Decoded JWT claims from the verified token
properties:
sub:
type: string
iss:
type: string
iat:
type: integer
exp:
type: integer
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
did:
type: string
partner:
type: object
description: The federation partner record that vouches for this token
properties:
partnerId:
type: string
name:
type: string
issuer:
type: string
example:
valid: true
claims:
sub: "agt_contoso_abc123"
iss: "https://agentidp.contoso.com"
iat: 1743249600
exp: 1743253200
agent_id: "agt_contoso_abc123"
agent_type: "classifier"
organization_id: "org_contoso_engineering"
capabilities: ["text-classification"]
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
partner:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized (local token invalid):
schema:
$ref: '#/components/schemas/ErrorResponse'
422 Unprocessable Entity (token invalid or untrusted issuer):
schema:
type: object
properties:
valid:
type: boolean
example: false
reason:
type: string
enum:
- TOKEN_EXPIRED
- INVALID_SIGNATURE
- UNTRUSTED_ISSUER
- JWKS_FETCH_FAILED
- ORGANIZATION_NOT_ALLOWED
message:
type: string
example:
valid: false
reason: "UNTRUSTED_ISSUER"
message: "No trust relationship registered for issuer https://unknown.example.com"
```
---
## Database Schema Changes
### New Table: federation_partners
```sql
CREATE TABLE federation_partners (
partner_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
name VARCHAR(100) NOT NULL,
issuer VARCHAR(255) NOT NULL,
jwks_uri VARCHAR(255) NOT NULL,
allowed_organizations JSONB NOT NULL DEFAULT '[]',
status VARCHAR(20) NOT NULL DEFAULT 'active',
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
last_jwks_fetch TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
UNIQUE (organization_id, issuer)
);
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
```
### Redis: JWKS Cache
Partner JWKS documents are cached in Redis with a TTL:
```
Key: federation:jwks:<issuer_url_sha256>
Value: JSON string of the JWKS document
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
---
## Dependencies
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
---
## Security Considerations
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
---
## Acceptance Criteria
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService

View File

@@ -0,0 +1,444 @@
# Multi-Tenancy — Specification
**Workstream**: 1 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
---
## API Endpoints
### POST /organizations
Create a new organization. Requires system-admin scope (`admin:orgs`).
```yaml
POST /organizations
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, slug]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Display name of the organization
example: "Acme AI Platform"
slug:
type: string
minLength: 2
maxLength: 50
pattern: "^[a-z0-9-]+$"
description: URL-safe unique identifier
example: "acme-ai"
planTier:
type: string
enum: [free, pro, enterprise]
default: free
maxAgents:
type: integer
minimum: 1
default: 100
maxTokensPerMonth:
type: integer
minimum: 1
default: 10000
Responses:
201 Created:
schema:
$ref: '#/components/schemas/Organization'
example:
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
maxAgents: 100
maxTokensPerMonth: 10000
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "VALIDATION_ERROR"
message: "slug must be unique"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "admin:orgs scope required"
```
---
### GET /organizations
List all organizations. Requires `admin:orgs` scope.
```yaml
GET /organizations
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, deleted]
page:
type: integer
minimum: 1
default: 1
limit:
type: integer
minimum: 1
maximum: 100
default: 20
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Organization'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /organizations/:orgId
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
```yaml
GET /organizations/{orgId}
Authorization: Bearer <token>
Path Parameters:
orgId:
type: string
description: Organization ID (org_... prefix)
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_NOT_FOUND"
message: "Organization not found"
```
---
### PATCH /organizations/:orgId
Partially update an organization. Requires `admin:orgs` scope.
```yaml
PATCH /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
name:
type: string
minLength: 2
maxLength: 100
planTier:
type: string
enum: [free, pro, enterprise]
maxAgents:
type: integer
minimum: 1
maxTokensPerMonth:
type: integer
minimum: 1
status:
type: string
enum: [active, suspended]
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /organizations/:orgId
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
```yaml
DELETE /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_HAS_ACTIVE_AGENTS"
message: "Organization has active agents; decommission all agents before deleting"
```
---
### POST /organizations/:orgId/members
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
```yaml
POST /organizations/{orgId}/members
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [agentId, role]
properties:
agentId:
type: string
description: ID of an already-registered agent to add as a member
role:
type: string
enum: [member, admin]
description: Role within the organization
Responses:
201 Created:
schema:
$ref: '#/components/schemas/OrgMember'
example:
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
role: "member"
joinedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ALREADY_MEMBER"
message: "Agent is already a member of this organization"
```
---
### Modified: All /agents, /audit endpoints
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
---
## Database Schema Changes
### New Table: organizations
```sql
CREATE TABLE organizations (
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
name VARCHAR(100) NOT NULL,
slug VARCHAR(50) NOT NULL UNIQUE,
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
max_agents INTEGER NOT NULL DEFAULT 100,
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
status VARCHAR(20) NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
);
CREATE INDEX idx_organizations_slug ON organizations(slug);
CREATE INDEX idx_organizations_status ON organizations(status);
```
### New Table: organization_members
```sql
CREATE TABLE organization_members (
member_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
role VARCHAR(20) NOT NULL DEFAULT 'member',
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
UNIQUE (organization_id, agent_id)
);
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
```
### Modified: agents table
```sql
ALTER TABLE agents
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
-- RLS
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
CREATE POLICY agents_org_isolation ON agents
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: credentials table
```sql
ALTER TABLE credentials
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY credentials_org_isolation ON credentials
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_logs_org_isolation ON audit_logs
USING (organization_id = current_setting('app.organization_id', true));
```
### Seed: Default system organization
```sql
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
---
## Dependencies
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
---
## Security Considerations
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
- `SET LOCAL app.organization_id` is called at the start of every database transaction
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
---
## Acceptance Criteria
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
- [ ] Pre-existing agents assigned to default system organization without data loss
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService

366
openspec/specs/oidc/spec.md Normal file
View File

@@ -0,0 +1,366 @@
# OpenID Connect (OIDC) — Specification
**Workstream**: 3 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
---
## API Endpoints
### GET /.well-known/openid-configuration
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
```yaml
GET /.well-known/openid-configuration
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: OIDC Discovery document per OpenID Connect Discovery 1.0
example:
issuer: "https://idp.sentryagent.ai"
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
response_types_supported:
- "token"
grant_types_supported:
- "client_credentials"
subject_types_supported:
- "public"
id_token_signing_alg_values_supported:
- "RS256"
- "ES256"
scopes_supported:
- "openid"
- "agents:read"
- "agents:write"
- "tokens:read"
- "audit:read"
claims_supported:
- "sub"
- "iss"
- "iat"
- "exp"
- "agent_id"
- "agent_type"
- "organization_id"
- "capabilities"
- "deployment_env"
- "owner"
token_endpoint_auth_methods_supported:
- "client_secret_post"
- "client_secret_basic"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /.well-known/jwks.json
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
```yaml
GET /.well-known/jwks.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
Cache-Control: public, max-age=3600
schema:
type: object
required: [keys]
properties:
keys:
type: array
items:
type: object
description: JSON Web Key (RFC 7517)
properties:
kty:
type: string
example: "RSA"
use:
type: string
example: "sig"
kid:
type: string
description: Key ID — matches `kid` header in issued JWTs
alg:
type: string
example: "RS256"
n:
type: string
description: RSA modulus (base64url)
e:
type: string
description: RSA exponent (base64url)
example:
keys:
- kty: "RSA"
use: "sig"
kid: "key-2026-03-29-01"
alg: "RS256"
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
e: "AQAB"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /oauth2/token (extended)
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
```yaml
POST /oauth2/token
Content-Type: application/x-www-form-urlencoded
Request Body:
schema:
type: object
required: [grant_type, client_id, client_secret]
properties:
grant_type:
type: string
enum: [client_credentials]
client_id:
type: string
client_secret:
type: string
scope:
type: string
description: Space-separated scopes. Include "openid" to receive an id_token.
example: "openid agents:read"
Responses:
200 OK (with openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
example: "Bearer"
expires_in:
type: integer
scope:
type: string
id_token:
type: string
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "openid agents:read"
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
200 OK (without openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
expires_in:
type: integer
scope:
type: string
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "agents:read"
400 Bad Request:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
example:
error: "invalid_client"
error_description: "Invalid client credentials"
401 Unauthorized:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
```
#### ID Token Claims
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
```json
{
"iss": "https://idp.sentryagent.ai",
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
"iat": 1743249600,
"exp": 1743253200,
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
"agent_type": "orchestrator",
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
"capabilities": ["task-planning", "tool-use"],
"deployment_env": "production",
"owner": "acme-ai",
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
}
```
---
### GET /agent-info
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
```yaml
GET /agent-info
Authorization: Bearer <access_token>
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: Agent identity claims (subset of registered agent data)
properties:
sub:
type: string
description: Subject — agentId
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
deployment_env:
type: string
owner:
type: string
version:
type: string
status:
type: string
did:
type: string
description: W3C DID for this agent (if DID workstream is active)
created_at:
type: string
format: date-time
example:
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_type: "orchestrator"
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
capabilities: ["task-planning", "tool-use"]
deployment_env: "production"
owner: "acme-ai"
version: "1.2.0"
status: "active"
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
created_at: "2026-03-29T12:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "Invalid or expired access token"
```
---
## Database Schema Changes
### New Table: oidc_keys
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
```sql
CREATE TABLE oidc_keys (
key_id VARCHAR(40) PRIMARY KEY,
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
algorithm VARCHAR(10) NOT NULL,
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
retired_at TIMESTAMPTZ,
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
);
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
---
## Security Considerations
- ID token signing keys are stored in Vault; public keys only are served via JWKS
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
- ID tokens for agent credentials must not contain client secrets or internal system paths
- `alg: none` is explicitly rejected — all ID tokens must be signed
---
## Acceptance Criteria
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
- [ ] `/agent-info` returns correct claims for authenticated agent
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService

View File

@@ -0,0 +1,7 @@
## ADDED Requirements
### Requirement: Security guide exists at docs/devops/security.md
The system SHALL provide `docs/devops/security.md` documenting RSA keypair generation, key rotation procedure, CORS configuration, and secret storage guidance.
### Requirement: Operations runbook exists at docs/devops/operations.md
The system SHALL provide `docs/devops/operations.md` covering startup procedure, graceful shutdown (SIGTERM/SIGINT), log interpretation, and troubleshooting for the most common operational failures.

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Quick-start guide exists at docs/developers/quick-start.md
The system SHALL provide a quick-start guide at `docs/developers/quick-start.md` that enables a bedroom developer to register their first agent and issue an OAuth 2.0 access token in under 5 minutes.
#### Scenario: Developer completes quick-start from zero
- **WHEN** a developer with no prior AgentIdP knowledge follows the quick-start guide
- **THEN** they SHALL have a registered agent, a valid credential, and a working access token by the end
### Requirement: Quick-start lists exact prerequisites
The quick-start guide SHALL list all prerequisites at the top before any steps, so the developer knows what they need before starting.
#### Scenario: Prerequisites are minimal and explicit
- **WHEN** the developer reads the prerequisites section
- **THEN** they SHALL see exactly: Docker (for running PostgreSQL and Redis) and curl (for API calls) — nothing else required
### Requirement: Quick-start provides a working docker-compose startup command
The quick-start guide SHALL include a single command to start the required infrastructure (PostgreSQL + Redis) using the project's `docker-compose.yml`.
#### Scenario: Developer starts infrastructure
- **WHEN** the developer runs the provided docker-compose command
- **THEN** the guide SHALL confirm what services are started and what ports they run on
### Requirement: Quick-start covers the full 4-step workflow
The quick-start guide SHALL cover exactly these four steps in order, each with a working curl command and the expected response:
1. Start the AgentIdP server
2. Register an agent (`POST /agents`)
3. Generate a credential (`POST /agents/{agentId}/credentials`)
4. Issue an access token (`POST /token`)
#### Scenario: Each step has a copy-pasteable curl command
- **WHEN** the developer reads any step
- **THEN** they SHALL find a complete curl command with real placeholder values they can substitute
#### Scenario: Each step shows the expected JSON response
- **WHEN** the developer runs a curl command from the guide
- **THEN** the guide SHALL show them what a successful response looks like so they can verify their output
### Requirement: Quick-start ends with a next-steps section
The quick-start guide SHALL end with a "What's Next" section linking to: core-concepts.md, developer-guides.md, and api-reference.md.
#### Scenario: Developer knows where to go after quick-start
- **WHEN** the developer reaches the end of the quick-start
- **THEN** they SHALL see at least 3 links to deeper documentation

335
openspec/specs/soc2/spec.md Normal file
View File

@@ -0,0 +1,335 @@
# SOC 2 Type II Preparation — Specification
**Workstream**: 6 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
---
## Technical Controls
### Control C1: Encryption at Rest (Column-Level Encryption)
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
**Columns encrypted**:
- `credentials.secret_hash` — encrypted with AES-256-CBC
- `credentials.vault_path` — encrypted with AES-256-CBC
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
---
### Control C2: TLS Enforcement
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
---
### Control C3: Automated Secrets Rotation
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
3. Renews Vault leases for all active credentials
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
---
### Control C4: Audit Log Immutability (Merkle Hash Chain)
Every `audit_logs` row carries two new columns:
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
---
### Control C5: Security Event Alerting
Prometheus alerting rules are written for the following security events:
| Alert | Condition | Severity |
|-------|-----------|---------|
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
---
## API Endpoints
### GET /audit/verify
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
```yaml
GET /audit/verify
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
fromDate:
type: string
format: date-time
description: Start of verification range. If omitted, verifies from genesis.
toDate:
type: string
format: date-time
description: End of verification range. If omitted, verifies to the latest row.
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
description: True if the chain is intact across the entire range
rowsVerified:
type: integer
description: Number of audit rows verified
firstEventId:
type: string
lastEventId:
type: string
firstTimestamp:
type: string
format: date-time
lastTimestamp:
type: string
format: date-time
verifiedAt:
type: string
format: date-time
brokenAtEventId:
type: string
nullable: true
description: Present only if valid=false — the first eventId where the chain breaks
example:
valid: true
rowsVerified: 15420
firstEventId: "evt_genesis_00001"
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
firstTimestamp: "2026-01-01T00:00:00Z"
lastTimestamp: "2026-03-29T12:00:00Z"
verifiedAt: "2026-03-29T14:00:00Z"
brokenAtEventId: null
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
429 Too Many Requests:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMITED"
message: "Audit verification can be run at most once per 5 minutes"
```
---
### GET /compliance/controls
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
```yaml
GET /compliance/controls
Authorization: Bearer <token with admin:orgs scope>
Responses:
200 OK:
schema:
type: object
properties:
generatedAt:
type: string
format: date-time
controls:
type: array
items:
type: object
properties:
controlId:
type: string
name:
type: string
status:
type: string
enum: [pass, fail, warning, not_applicable]
description:
type: string
lastChecked:
type: string
format: date-time
example:
generatedAt: "2026-03-29T14:00:00Z"
controls:
- controlId: "C1"
name: "Encryption at Rest"
status: "pass"
description: "Column-level encryption active for all sensitive columns"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C2"
name: "TLS Enforcement"
status: "pass"
description: "All non-TLS requests redirected to HTTPS in production"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C3"
name: "Secrets Rotation"
status: "warning"
description: "3 credentials expiring within 7 days"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C4"
name: "Audit Log Immutability"
status: "pass"
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C5"
name: "Security Event Alerting"
status: "pass"
description: "All 6 alerting rules active in Prometheus"
lastChecked: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
-- Back-fill genesis hash for existing rows (one-time migration)
-- Migration script computes chain in order of created_at
-- Prevent updates and deletes (immutability trigger)
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_logs_immutability
BEFORE UPDATE OR DELETE ON audit_logs
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
```
### Modified: credentials table
```sql
-- Columns remain same type; application now stores encrypted values
-- No DDL change — encryption is transparent at application layer
-- Add comment for documentation
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
```
### New Table: compliance_check_log
```sql
CREATE TABLE compliance_check_log (
check_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
control_id VARCHAR(10) NOT NULL,
status VARCHAR(20) NOT NULL,
details JSONB NOT NULL DEFAULT '{}',
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
---
## Compliance Documentation
The following documents are produced as part of this workstream:
| Document | Path | Description |
|----------|------|-------------|
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
---
## Security Considerations
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
- `GET /compliance/controls` never returns raw secrets or key material — only control status
---
## Acceptance Criteria
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
- [ ] Compliance documentation written and reviewed
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations

View File

@@ -0,0 +1,10 @@
## ADDED Requirements
### Requirement: System overview exists at docs/devops/README.md
The system SHALL provide a `docs/devops/README.md` that serves as the entry point for DevOps engineers, including an index of all DevOps docs and a brief system overview.
### Requirement: Architecture doc exists at docs/devops/architecture.md
The system SHALL provide `docs/devops/architecture.md` documenting all components (Express server, PostgreSQL, Redis), their roles, ports, and data flow.
### Requirement: Environment variable reference exists at docs/devops/environment-variables.md
The system SHALL provide `docs/devops/environment-variables.md` documenting every environment variable with name, type, required/optional, default, and example value.

View File

@@ -0,0 +1,353 @@
# W3C Decentralized Identifiers (DIDs) — Specification
**Workstream**: 2 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
---
## API Endpoints
### GET /.well-known/did.json
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
```yaml
GET /.well-known/did.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant DID Document
required: [id, "@context", verificationMethod, authentication]
properties:
"@context":
type: array
items:
type: string
example:
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/security/suites/jws-2020/v1"
id:
type: string
description: DID for this AgentIdP instance
example: "did:web:idp.sentryagent.ai"
controller:
type: string
example: "did:web:idp.sentryagent.ai"
verificationMethod:
type: array
items:
$ref: '#/components/schemas/VerificationMethod'
authentication:
type: array
items:
type: string
description: References to verification methods for authentication
assertionMethod:
type: array
items:
type: string
service:
type: array
items:
$ref: '#/components/schemas/DIDService'
example:
"@context":
- "https://www.w3.org/ns/did/v1"
id: "did:web:idp.sentryagent.ai"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
authentication:
- "did:web:idp.sentryagent.ai#key-1"
service:
- id: "did:web:idp.sentryagent.ai#agent-registry"
type: "AgentIdentityProvider"
serviceEndpoint: "https://idp.sentryagent.ai"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did
Per-agent DID Document. No authentication required — DID Documents are public.
```yaml
GET /agents/{agentId}/did
No authentication required
Path Parameters:
agentId:
type: string
description: Agent ID
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant per-agent DID Document
example:
"@context":
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/agntcy/v1"
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "abc123"
y: "def456"
authentication:
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
service:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
type: "AgentCard"
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
agntcy:
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
agentType: "orchestrator"
capabilities:
- "task-planning"
- "tool-use"
deploymentEnv: "production"
owner: "acme-ai"
version: "1.2.0"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_NOT_FOUND"
message: "Agent not found"
410 Gone:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_DECOMMISSIONED"
message: "Agent has been decommissioned — DID Document is no longer active"
```
---
### GET /agents/:id/did/resolve
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
```yaml
GET /agents/{agentId}/did/resolve
Authorization: Bearer <token with agents:read scope>
Path Parameters:
agentId:
type: string
Responses:
200 OK:
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
schema:
type: object
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
properties:
didDocument:
type: object
description: The resolved DID Document
didDocumentMetadata:
type: object
properties:
created:
type: string
format: date-time
updated:
type: string
format: date-time
deactivated:
type: boolean
didResolutionMetadata:
type: object
properties:
contentType:
type: string
example: "application/did+ld+json"
retrieved:
type: string
format: date-time
example:
didDocument:
"@context": ["https://www.w3.org/ns/did/v1"]
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
didDocumentMetadata:
created: "2026-03-29T12:00:00Z"
updated: "2026-03-29T12:00:00Z"
deactivated: false
didResolutionMetadata:
contentType: "application/did+ld+json"
retrieved: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did/card
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
```yaml
GET /agents/{agentId}/did/card
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: AGNTCY-format agent card
properties:
did:
type: string
name:
type: string
agentType:
type: string
capabilities:
type: array
items:
type: string
owner:
type: string
version:
type: string
deploymentEnv:
type: string
identityProvider:
type: string
description: DID of the issuing AgentIdP instance
issuedAt:
type: string
format: date-time
example:
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
name: "acme-orchestrator"
agentType: "orchestrator"
capabilities: ["task-planning", "tool-use"]
owner: "acme-ai"
version: "1.2.0"
deploymentEnv: "production"
identityProvider: "did:web:idp.sentryagent.ai"
issuedAt: "2026-03-29T12:00:00Z"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### New Table: agent_did_keys
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
```sql
CREATE TABLE agent_did_keys (
key_id VARCHAR(40) PRIMARY KEY,
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
rotated_at TIMESTAMPTZ,
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
);
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
```
### New Column: agents.did
```sql
ALTER TABLE agents
ADD COLUMN did VARCHAR(255),
ADD COLUMN did_created_at TIMESTAMPTZ;
-- Populated automatically on agent creation
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
---
## Security Considerations
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
---
## Acceptance Criteria
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
- [ ] DID resolution endpoint returns W3C DID Resolution format
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
- [ ] Private keys never appear in any API response or log
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService

View File

@@ -0,0 +1,476 @@
# Webhooks and Event Streaming — Specification
**Workstream**: 5 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
---
## API Endpoints
### POST /webhooks
Create a new webhook subscription. Requires `agents:write` scope.
```yaml
POST /webhooks
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [url, events, secret]
properties:
url:
type: string
format: uri
description: HTTPS endpoint to deliver events to
example: "https://app.example.com/hooks/agentidp"
events:
type: array
items:
type: string
enum:
- agent.created
- agent.updated
- agent.suspended
- agent.reactivated
- agent.decommissioned
- credential.generated
- credential.rotated
- credential.revoked
- token.issued
- token.revoked
- "*"
minItems: 1
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
example: ["agent.created", "credential.rotated"]
secret:
type: string
minLength: 16
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
example: "whsec_super_secret_value_here"
description:
type: string
maxLength: 255
description: Optional human-readable description for this subscription
active:
type: boolean
default: true
Responses:
201 Created:
schema:
$ref: '#/components/schemas/WebhookSubscription'
example:
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
url: "https://app.example.com/hooks/agentidp"
events: ["agent.created", "credential.rotated"]
description: "Production event sink"
active: true
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalid_url:
code: "VALIDATION_ERROR"
message: "url must be a valid HTTPS URI"
invalid_event:
code: "VALIDATION_ERROR"
message: "Unknown event type: agent.unknown"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
```yaml
GET /webhooks
Authorization: Bearer <token with agents:read scope>
Query Parameters:
active:
type: boolean
description: Filter by active/inactive subscriptions
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookSubscription'
total:
type: integer
page:
type: integer
limit:
type: integer
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id
Get a single webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:read scope>
Path Parameters:
subscriptionId:
type: string
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "WEBHOOK_NOT_FOUND"
message: "Webhook subscription not found"
```
---
### PATCH /webhooks/:id
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
```yaml
PATCH /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
url:
type: string
format: uri
events:
type: array
items:
type: string
description:
type: string
maxLength: 255
active:
type: boolean
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /webhooks/:id
Delete a webhook subscription. Requires `agents:write` scope.
```yaml
DELETE /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id/deliveries
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}/deliveries
Authorization: Bearer <token with agents:read scope>
Query Parameters:
status:
type: string
enum: [pending, success, failed, dead_letter]
eventType:
type: string
description: Filter by event type
fromDate:
type: string
format: date-time
toDate:
type: string
format: date-time
page:
type: integer
default: 1
limit:
type: integer
default: 50
maximum: 200
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookDelivery'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
eventType: "agent.created"
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
status: "success"
httpStatusCode: 200
attemptCount: 1
nextRetryAt: null
deliveredAt: "2026-03-29T12:00:05Z"
createdAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 50
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Webhook Payload Format
Every webhook delivery uses this envelope format:
```json
{
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
"type": "agent.created",
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
"timestamp": "2026-03-29T12:00:00Z",
"data": {
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
"agentType": "orchestrator",
"status": "active",
"owner": "acme-ai",
"version": "1.0.0",
"deploymentEnv": "production"
}
}
```
### HMAC-SHA256 Signature
Every delivery includes the following HTTP headers:
```
X-AgentIdP-Event: agent.created
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
X-AgentIdP-Timestamp: 1743249600
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
```
Signature computation:
```
signed_content = timestamp + "." + JSON.stringify(payload)
signature = HMAC-SHA256(secret, signed_content)
header_value = "sha256=" + hex(signature)
```
---
## Database Schema Changes
### New Table: webhook_subscriptions
```sql
CREATE TABLE webhook_subscriptions (
subscription_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
url VARCHAR(2048) NOT NULL,
events JSONB NOT NULL DEFAULT '[]',
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
vault_secret_path VARCHAR(255) NOT NULL,
description VARCHAR(255),
active BOOLEAN NOT NULL DEFAULT TRUE,
failure_count INTEGER NOT NULL DEFAULT 0,
last_delivery_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
```
### New Table: webhook_deliveries
```sql
CREATE TABLE webhook_deliveries (
delivery_id VARCHAR(40) PRIMARY KEY,
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
event_id VARCHAR(40) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
http_status_code SMALLINT,
response_body TEXT,
attempt_count SMALLINT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
);
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
```
---
## Retry Schedule
```
Attempt 1: immediate
Attempt 2: 1 minute after failure
Attempt 3: 5 minutes after failure
Attempt 4: 15 minutes after failure
Attempt 5: 1 hour after failure
Attempt 6: 4 hours after failure
Attempt 7: 12 hours after failure
Attempt 8: 24 hours after failure
Attempt 9: 48 hours after failure
Attempt 10: 72 hours after failure
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
---
## Security Considerations
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
- HMAC signature allows the receiving server to verify the delivery is authentic
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
---
## Acceptance Criteria
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
- [ ] After max retries, status is `dead_letter` and metric is incremented
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
- [ ] Delivery to private IP range is rejected (SSRF protection)
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService