chore(openspec): archive all completed changes, sync 14 new specs to library
Archived 4 completed OpenSpec changes (2026-04-02): - phase-3-enterprise (100/100 tasks) — 6 Phase 3 capabilities synced - devops-documentation (48/48 tasks) — 3 new + 1 merged capability - bedroom-developer-docs (33/33 tasks) — 4 new capabilities synced - engineering-docs (superseded by 2026-03-29 archive) — no tasks Main spec library grows from 21 → 35 capabilities (+14 new): federation, multi-tenancy, oidc, soc2, w3c-dids, webhooks, database, operations, system-overview, api-reference, core-concepts, developer-guides, quick-start + deployment (merged additive requirements) Active changes: 0 — project board is clear for Phase 4 planning. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-28
|
||||
@@ -0,0 +1,63 @@
|
||||
## Context
|
||||
|
||||
Phase 1 MVP is complete: 46 source files, 14 API endpoints across 4 OpenAPI 3.0 specs, 244 passing tests. The implementation is production-grade and live on `git.sentryagent.ai`. However, the developer experience stops at the code. There is no entry point for a bedroom developer who has never heard of AgentIdP, AGNTCY, or client credentials OAuth 2.0.
|
||||
|
||||
The documentation must be written, owned, and maintained as a first-class deliverable — not an afterthought. It is produced by a Virtual Technical Writer subagent with full access to the codebase and OpenAPI specs.
|
||||
|
||||
**Constraints:**
|
||||
- Audience: bedroom developers — assume competence with HTTP and basic programming, assume no prior knowledge of AgentIdP or AGNTCY
|
||||
- Format: Markdown only — renders on GitHub, no external tooling required
|
||||
- No build step — docs are static `.md` files in `docs/developers/`
|
||||
- All code examples must be real, runnable, and copy-pasteable
|
||||
- Tone: direct, practical, no enterprise jargon
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Bedroom developer can register their first agent and issue a token in under 5 minutes using only the quick-start guide
|
||||
- Every API endpoint is documented in plain English with at least one working curl example
|
||||
- Core concepts are explained without assuming prior knowledge of OAuth 2.0 or AGNTCY
|
||||
- All four P0 workflows (register, credential, token, audit) have step-by-step guides
|
||||
- FAQ covers the most likely failure points and free-tier limits
|
||||
|
||||
**Non-Goals:**
|
||||
- No web-rendered documentation site (Phase 2 — out of scope)
|
||||
- No SDK documentation (Node.js SDK not yet built — Phase 1 P1 remaining)
|
||||
- No video tutorials or interactive demos
|
||||
- No multi-language code examples (Node.js + curl only for now)
|
||||
- No enterprise deployment documentation (separate from bedroom developer focus)
|
||||
|
||||
## Decisions
|
||||
|
||||
**Decision 1: Single flat folder vs nested structure**
|
||||
Chosen: flat `docs/developers/` with a `tutorials/` subfolder only for multi-step guides.
|
||||
Alternative considered: deep nesting by category. Rejected — adds navigation friction for a small doc set.
|
||||
|
||||
**Decision 2: Raw OpenAPI YAML as API reference vs human-written reference**
|
||||
Chosen: human-written `api-reference.md` alongside the existing OpenAPI specs.
|
||||
Alternative considered: link to raw YAML only. Rejected — YAML is not readable for bedroom developers; the whole point is accessibility.
|
||||
|
||||
**Decision 3: Standalone docs vs inline code comments**
|
||||
Chosen: standalone Markdown files in `docs/developers/`.
|
||||
Alternative considered: JSDoc-generated docs. Rejected — JSDoc is for library consumers, not REST API users.
|
||||
|
||||
**Decision 4: Who writes the docs**
|
||||
Chosen: Virtual Technical Writer subagent — spawned by CTO with full codebase + OpenAPI spec context.
|
||||
Alternative considered: Virtual Principal Developer writes docs. Rejected — developer time should stay on code; writing accessible prose for non-technical audiences is a distinct skill warranting a dedicated role.
|
||||
|
||||
**Decision 5: Versioning**
|
||||
Chosen: docs live in the same repo as code, versioned together via git. No separate docs versioning scheme in Phase 1.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **[Risk] Docs drift from implementation** → Mitigation: Virtual QA Engineer verifies API reference examples against actual endpoints before sign-off; curl examples are tested against a running instance
|
||||
- **[Risk] Tone inconsistency across docs** → Mitigation: Technical Writer receives a unified style brief in the subagent prompt (plain English, second person, imperative voice, no jargon)
|
||||
- **[Risk] Quick-start prerequisites unclear** → Mitigation: Quick-start lists exact prerequisites (Docker, curl, nothing else) and links to docker-compose.yml
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Documentation only — no migration required. Files are added to `docs/developers/` and committed to `develop`. No rollback needed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
*(none — scope is fully defined)*
|
||||
@@ -0,0 +1,34 @@
|
||||
## Why
|
||||
|
||||
SentryAgent.ai AgentIdP Phase 1 MVP is fully implemented, tested, and live — but there is zero human-readable documentation for the developers we are building this for. A bedroom developer landing on this repo today cannot register their first agent without reading raw OpenAPI YAML or diving into source code. We fix that now.
|
||||
|
||||
## What Changes
|
||||
|
||||
- New `docs/developers/` folder containing a complete, self-contained documentation set for bedroom developers
|
||||
- Quick-start guide: first agent registered and authenticated in under 5 minutes
|
||||
- Core concepts doc: plain-English explanation of AgentIdP, AGNTCY alignment, and the agent identity model
|
||||
- Step-by-step guides: agent registration, credential management, token issuance, audit log queries
|
||||
- Human-friendly API reference: every endpoint documented with real curl examples and response samples
|
||||
- FAQ: common errors, gotchas, and free-tier limits explained
|
||||
- All docs written for a bedroom developer audience — no enterprise jargon, no assumed knowledge
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
|
||||
- `quick-start`: 5-minute guide from zero to first authenticated agent request — install, register, credential, token, done
|
||||
- `core-concepts`: Plain-English explanation of what AgentIdP is, how it relates to AGNTCY, the agent identity lifecycle, and why it matters
|
||||
- `developer-guides`: Step-by-step tutorials for the four core workflows: registering an agent, managing credentials, issuing and revoking tokens, querying the audit log
|
||||
- `api-reference`: Human-friendly API reference covering all 14 endpoints with real examples, field descriptions, error codes, and rate limit notes
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
*(none — this change introduces documentation only; no existing API specs are modified)*
|
||||
|
||||
## Impact
|
||||
|
||||
- New folder: `docs/developers/` (7 markdown files)
|
||||
- No code changes — documentation only
|
||||
- No new dependencies
|
||||
- No API changes
|
||||
- Existing `docs/openapi/` specs are reference material for the Technical Writer but are not modified
|
||||
@@ -0,0 +1,50 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: API reference exists at docs/developers/api-reference.md
|
||||
The system SHALL provide a human-readable API reference at `docs/developers/api-reference.md` covering all 14 endpoints across the four services: Agent Registry, OAuth 2.0 Token, Credential Management, and Audit Log.
|
||||
|
||||
#### Scenario: Developer finds any endpoint within 10 seconds
|
||||
- **WHEN** the developer opens the API reference
|
||||
- **THEN** they SHALL find a table of contents at the top linking to each of the four service sections
|
||||
|
||||
### Requirement: Every endpoint is documented with method, path, description, and auth requirements
|
||||
For each of the 14 endpoints, the reference SHALL document: HTTP method, path, one-sentence description, and whether Bearer token auth is required.
|
||||
|
||||
#### Scenario: Developer knows which endpoints require authentication
|
||||
- **WHEN** the developer scans the reference
|
||||
- **THEN** they SHALL clearly see which endpoints require a Bearer token (all except POST /token) and which do not
|
||||
|
||||
### Requirement: Every endpoint includes a complete curl example
|
||||
For each endpoint, the reference SHALL include at least one complete, runnable curl example with real placeholder values.
|
||||
|
||||
#### Scenario: Developer copies a curl example and runs it
|
||||
- **WHEN** the developer copies a curl example from the reference
|
||||
- **THEN** the command SHALL be complete — no ellipses, no `...`, no missing flags — requiring only substitution of their own agentId, token, and base URL
|
||||
|
||||
### Requirement: Every endpoint documents all request parameters and body fields
|
||||
For each endpoint that accepts a request body or query parameters, the reference SHALL list every field with: name, type, required/optional, description, and validation constraints.
|
||||
|
||||
#### Scenario: Developer knows what fields are required for POST /agents
|
||||
- **WHEN** the developer reads the POST /agents section
|
||||
- **THEN** they SHALL see a table listing every field, its type, whether it is required, and any constraints (e.g. email format, max length)
|
||||
|
||||
### Requirement: Every endpoint documents all response codes and response body schemas
|
||||
For each endpoint, the reference SHALL document every possible HTTP response code (2xx and 4xx/5xx) with a description and example response body.
|
||||
|
||||
#### Scenario: Developer understands a 429 response
|
||||
- **WHEN** the developer reads the rate limit error documentation
|
||||
- **THEN** they SHALL understand what triggered it, what the X-RateLimit-* headers mean, and when they can retry
|
||||
|
||||
### Requirement: API reference includes a base URL and versioning section
|
||||
The reference SHALL include a section at the top explaining the base URL convention, port configuration, and that all endpoints are unversioned in Phase 1.
|
||||
|
||||
#### Scenario: Developer knows where to send requests
|
||||
- **WHEN** the developer reads the base URL section
|
||||
- **THEN** they SHALL see the default base URL (http://localhost:3000), how to change the port via environment variable, and a note that versioning will be introduced in Phase 2
|
||||
|
||||
### Requirement: API reference includes an errors section
|
||||
The reference SHALL include a dedicated errors section listing all standard error response shapes, all custom error codes, and their HTTP status code mappings.
|
||||
|
||||
#### Scenario: Developer handles an AgentNotFoundError
|
||||
- **WHEN** the developer reads the errors section
|
||||
- **THEN** they SHALL see the exact JSON shape of the error response, the error code string, and the HTTP status (404)
|
||||
@@ -0,0 +1,43 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Core concepts guide exists at docs/developers/concepts.md
|
||||
The system SHALL provide a concepts guide at `docs/developers/concepts.md` that explains the AgentIdP model in plain English with no assumed prior knowledge of AGNTCY or OAuth 2.0.
|
||||
|
||||
#### Scenario: Developer understands what AgentIdP is
|
||||
- **WHEN** a developer reads the concepts guide
|
||||
- **THEN** they SHALL be able to explain in one sentence what SentryAgent.ai AgentIdP does and why they need it
|
||||
|
||||
### Requirement: Concepts guide explains what an AI agent identity is
|
||||
The guide SHALL explain in plain English what it means to give an AI agent an identity — how it differs from a human user account and why agents need their own identity model.
|
||||
|
||||
#### Scenario: Agent identity vs human identity distinction is clear
|
||||
- **WHEN** the developer reads the agent identity section
|
||||
- **THEN** they SHALL understand that agents are non-human, machine-operated identities that need persistent, auditable credentials — not session-based logins
|
||||
|
||||
### Requirement: Concepts guide explains the AGNTCY alignment
|
||||
The guide SHALL explain what AGNTCY is (Linux Foundation standard), why SentryAgent.ai aligns to it, and what benefit that gives the developer — without requiring the developer to read the AGNTCY specification.
|
||||
|
||||
#### Scenario: Developer understands AGNTCY without external reading
|
||||
- **WHEN** the developer reads the AGNTCY section
|
||||
- **THEN** they SHALL understand that AGNTCY-aligned agent IDs are interoperable across the AI agent ecosystem, and that SentryAgent.ai implements this for free
|
||||
|
||||
### Requirement: Concepts guide explains the agent lifecycle
|
||||
The guide SHALL explain the four lifecycle states of an agent (active, suspended, decommissioned) and what each state means for credential and token behaviour.
|
||||
|
||||
#### Scenario: Developer understands what happens when an agent is decommissioned
|
||||
- **WHEN** the developer reads the lifecycle section
|
||||
- **THEN** they SHALL understand that decommissioning is irreversible, all credentials are revoked, and no new tokens can be issued
|
||||
|
||||
### Requirement: Concepts guide explains OAuth 2.0 Client Credentials in plain English
|
||||
The guide SHALL explain the Client Credentials grant in plain English — no RFC references, no formal OAuth jargon — focused on how agents use it to authenticate.
|
||||
|
||||
#### Scenario: Developer understands client_id and client_secret without prior OAuth knowledge
|
||||
- **WHEN** the developer reads the OAuth section
|
||||
- **THEN** they SHALL understand that client_id identifies the agent and client_secret proves it — analogous to a username and password for machines
|
||||
|
||||
### Requirement: Concepts guide explains the free-tier limits
|
||||
The guide SHALL document all free-tier limits (100 agents, 10,000 tokens/month, 100 req/min, 90-day audit retention) in a clear table.
|
||||
|
||||
#### Scenario: Developer knows the limits before hitting them
|
||||
- **WHEN** the developer reads the free-tier section
|
||||
- **THEN** they SHALL see a table with all four limits and a note on what happens when each is exceeded
|
||||
@@ -0,0 +1,56 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Developer guides index exists at docs/developers/guides/README.md
|
||||
The system SHALL provide a guides index at `docs/developers/guides/README.md` listing all available guides with one-line descriptions and links.
|
||||
|
||||
#### Scenario: Developer finds the right guide quickly
|
||||
- **WHEN** the developer opens the guides folder
|
||||
- **THEN** they SHALL see a list of all guides with descriptions so they can choose the one relevant to their task
|
||||
|
||||
### Requirement: Agent registration guide exists at docs/developers/guides/register-an-agent.md
|
||||
The system SHALL provide a step-by-step guide for registering an agent, including all required and optional fields, validation rules, and how to handle the response.
|
||||
|
||||
#### Scenario: Developer registers their first agent
|
||||
- **WHEN** the developer follows the registration guide
|
||||
- **THEN** they SHALL successfully create an agent and understand what `agentId`, `clientId`, and `status` mean in the response
|
||||
|
||||
#### Scenario: Developer understands registration validation errors
|
||||
- **WHEN** the guide covers validation
|
||||
- **THEN** it SHALL show examples of common validation errors (missing required fields, invalid email format) and how to fix them
|
||||
|
||||
### Requirement: Credential management guide exists at docs/developers/guides/manage-credentials.md
|
||||
The system SHALL provide a guide covering all four credential operations: generate, list, rotate, and revoke — with curl examples and explanation of when to use each.
|
||||
|
||||
#### Scenario: Developer rotates a compromised credential
|
||||
- **WHEN** the developer follows the rotation section
|
||||
- **THEN** they SHALL understand that rotation replaces the secret while keeping the same `credentialId`, and the old secret is immediately invalid
|
||||
|
||||
#### Scenario: Developer understands credential revocation vs agent decommission
|
||||
- **WHEN** the developer reads the guide
|
||||
- **THEN** they SHALL understand the difference: revoking a credential leaves the agent active with other credentials; decommissioning the agent revokes everything permanently
|
||||
|
||||
### Requirement: Token guide exists at docs/developers/guides/issue-and-revoke-tokens.md
|
||||
The system SHALL provide a guide covering token issuance, introspection, and revocation — explaining the JWT structure, expiry, and how to use the Bearer token in API requests.
|
||||
|
||||
#### Scenario: Developer uses a token to authenticate a request
|
||||
- **WHEN** the developer follows the token guide
|
||||
- **THEN** they SHALL see an example of using the issued token as a Bearer token in an Authorization header on a subsequent API call
|
||||
|
||||
#### Scenario: Developer introspects a token to check validity
|
||||
- **WHEN** the developer reads the introspection section
|
||||
- **THEN** they SHALL understand what `active: true/false` means and what fields are returned
|
||||
|
||||
#### Scenario: Developer revokes a token
|
||||
- **WHEN** the developer follows the revocation section
|
||||
- **THEN** they SHALL understand that revoked tokens are immediately invalid even if not yet expired
|
||||
|
||||
### Requirement: Audit log guide exists at docs/developers/guides/query-audit-logs.md
|
||||
The system SHALL provide a guide for querying the audit log — covering available filters (agentId, action, outcome, date range), pagination, and how to interpret audit events.
|
||||
|
||||
#### Scenario: Developer queries audit events for a specific agent
|
||||
- **WHEN** the developer follows the audit guide
|
||||
- **THEN** they SHALL see a curl example filtering by `agentId` and understand the structure of each audit event
|
||||
|
||||
#### Scenario: Developer understands audit log retention
|
||||
- **WHEN** the developer reads the guide
|
||||
- **THEN** they SHALL understand that free-tier audit logs are retained for 90 days and what happens after that window
|
||||
@@ -0,0 +1,45 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Quick-start guide exists at docs/developers/quick-start.md
|
||||
The system SHALL provide a quick-start guide at `docs/developers/quick-start.md` that enables a bedroom developer to register their first agent and issue an OAuth 2.0 access token in under 5 minutes.
|
||||
|
||||
#### Scenario: Developer completes quick-start from zero
|
||||
- **WHEN** a developer with no prior AgentIdP knowledge follows the quick-start guide
|
||||
- **THEN** they SHALL have a registered agent, a valid credential, and a working access token by the end
|
||||
|
||||
### Requirement: Quick-start lists exact prerequisites
|
||||
The quick-start guide SHALL list all prerequisites at the top before any steps, so the developer knows what they need before starting.
|
||||
|
||||
#### Scenario: Prerequisites are minimal and explicit
|
||||
- **WHEN** the developer reads the prerequisites section
|
||||
- **THEN** they SHALL see exactly: Docker (for running PostgreSQL and Redis) and curl (for API calls) — nothing else required
|
||||
|
||||
### Requirement: Quick-start provides a working docker-compose startup command
|
||||
The quick-start guide SHALL include a single command to start the required infrastructure (PostgreSQL + Redis) using the project's `docker-compose.yml`.
|
||||
|
||||
#### Scenario: Developer starts infrastructure
|
||||
- **WHEN** the developer runs the provided docker-compose command
|
||||
- **THEN** the guide SHALL confirm what services are started and what ports they run on
|
||||
|
||||
### Requirement: Quick-start covers the full 4-step workflow
|
||||
The quick-start guide SHALL cover exactly these four steps in order, each with a working curl command and the expected response:
|
||||
|
||||
1. Start the AgentIdP server
|
||||
2. Register an agent (`POST /agents`)
|
||||
3. Generate a credential (`POST /agents/{agentId}/credentials`)
|
||||
4. Issue an access token (`POST /token`)
|
||||
|
||||
#### Scenario: Each step has a copy-pasteable curl command
|
||||
- **WHEN** the developer reads any step
|
||||
- **THEN** they SHALL find a complete curl command with real placeholder values they can substitute
|
||||
|
||||
#### Scenario: Each step shows the expected JSON response
|
||||
- **WHEN** the developer runs a curl command from the guide
|
||||
- **THEN** the guide SHALL show them what a successful response looks like so they can verify their output
|
||||
|
||||
### Requirement: Quick-start ends with a next-steps section
|
||||
The quick-start guide SHALL end with a "What's Next" section linking to: core-concepts.md, developer-guides.md, and api-reference.md.
|
||||
|
||||
#### Scenario: Developer knows where to go after quick-start
|
||||
- **WHEN** the developer reaches the end of the quick-start
|
||||
- **THEN** they SHALL see at least 3 links to deeper documentation
|
||||
@@ -0,0 +1,50 @@
|
||||
## 1. Folder Structure & Setup
|
||||
|
||||
- [x] 1.1 Create `docs/developers/` directory
|
||||
- [x] 1.2 Create `docs/developers/guides/` subdirectory
|
||||
- [x] 1.3 Create `docs/developers/README.md` — index page listing all docs with one-line descriptions and links
|
||||
|
||||
## 2. Quick-Start Guide
|
||||
|
||||
- [x] 2.1 Create `docs/developers/quick-start.md` — prerequisites section (Docker + curl only)
|
||||
- [x] 2.2 Write Step 1: start infrastructure with docker-compose command + confirmation of services and ports
|
||||
- [x] 2.3 Write Step 2: start AgentIdP server with npm command + expected startup output
|
||||
- [x] 2.4 Write Step 3: register an agent — complete curl for `POST /agents` with example request body and expected JSON response
|
||||
- [x] 2.5 Write Step 4: generate a credential — complete curl for `POST /agents/{agentId}/credentials` with example response showing `clientId` and `clientSecret`
|
||||
- [x] 2.6 Write Step 5: issue an access token — complete curl for `POST /token` with form-encoded body and example JWT response
|
||||
- [x] 2.7 Write "What's Next" section linking to concepts.md, guides/README.md, and api-reference.md
|
||||
|
||||
## 3. Core Concepts Guide
|
||||
|
||||
- [x] 3.1 Create `docs/developers/concepts.md` — intro section: what is AgentIdP in one paragraph
|
||||
- [x] 3.2 Write "What is an AI Agent Identity" section — plain-English explanation of agent identities vs human identities
|
||||
- [x] 3.3 Write "AGNTCY Alignment" section — what AGNTCY is, why it matters, benefit to the developer (no external reading required)
|
||||
- [x] 3.4 Write "Agent Lifecycle" section — four states (active, suspended, decommissioned) and what each means for credentials and tokens, including irreversibility of decommission
|
||||
- [x] 3.5 Write "OAuth 2.0 Client Credentials" section — plain-English explanation of client_id, client_secret, and how agents use them; no RFC jargon
|
||||
- [x] 3.6 Write "Free Tier Limits" section — table of all four limits (100 agents, 10k tokens/month, 100 req/min, 90-day audit) with notes on what happens when each is exceeded
|
||||
|
||||
## 4. Developer Guides
|
||||
|
||||
- [x] 4.1 Create `docs/developers/guides/README.md` — index listing all four guides with descriptions and links
|
||||
- [x] 4.2 Create `docs/developers/guides/register-an-agent.md` — step-by-step registration guide with all required/optional fields, validation rules, and example success + error responses (including common validation errors and fixes)
|
||||
- [x] 4.3 Create `docs/developers/guides/manage-credentials.md` — guide covering all four credential operations: generate (with secret handling note), list (with pagination), rotate (explaining same credentialId, old secret immediately invalid), revoke (with comparison to agent decommission)
|
||||
- [x] 4.4 Create `docs/developers/guides/issue-and-revoke-tokens.md` — token guide covering: issuance with form-encoded body, JWT structure explanation, using token as Bearer in subsequent requests, introspection (`active` field), revocation and immediate invalidation
|
||||
- [x] 4.5 Create `docs/developers/guides/query-audit-logs.md` — audit log guide covering: available filters (agentId, action, outcome, date range), pagination params, audit event structure, 90-day retention behaviour
|
||||
|
||||
## 5. API Reference
|
||||
|
||||
- [x] 5.1 Create `docs/developers/api-reference.md` — top section: base URL, port config via env var, versioning note (Phase 1 unversioned)
|
||||
- [x] 5.2 Write table of contents linking to all four service sections
|
||||
- [x] 5.3 Write errors reference section: all error response shapes, all custom error codes (ValidationError, AgentNotFoundError, AgentAlreadyExistsError, CredentialError, AuthenticationError, AuthorizationError, RateLimitError, FreeTierLimitError), HTTP status mappings
|
||||
- [x] 5.4 Document Agent Registry endpoints (5): `POST /agents`, `GET /agents`, `GET /agents/{agentId}`, `PATCH /agents/{agentId}`, `DELETE /agents/{agentId}` — each with method, path, auth requirement, request fields table, response codes table, and complete curl example
|
||||
- [x] 5.5 Document OAuth 2.0 Token endpoints (3): `POST /token`, `POST /token/introspect`, `POST /token/revoke` — each with method, path, auth requirement, request fields table (noting form-encoded for /token), response codes table, curl example, and X-RateLimit header documentation for 429s
|
||||
- [x] 5.6 Document Credential Management endpoints (4): `POST /agents/{agentId}/credentials`, `GET /agents/{agentId}/credentials`, `POST /agents/{agentId}/credentials/{credentialId}/rotate`, `DELETE /agents/{agentId}/credentials/{credentialId}` — each with method, path, auth requirement, request fields table, response codes table, and complete curl example
|
||||
- [x] 5.7 Document Audit Log endpoints (2): `GET /audit`, `GET /audit/{eventId}` — each with method, path, auth requirement, query parameter table (including all filter options), response codes table, and complete curl example
|
||||
|
||||
## 6. QA & Review
|
||||
|
||||
- [x] 6.1 Verify all curl examples are syntactically correct and complete (no ellipses, no missing flags)
|
||||
- [x] 6.2 Verify all 14 endpoints from the OpenAPI specs are covered in api-reference.md
|
||||
- [x] 6.3 Verify all internal links (cross-references between docs) resolve correctly
|
||||
- [x] 6.4 Verify free-tier limits in concepts.md match README.md Section 3.3
|
||||
- [x] 6.5 Verify quick-start guide is self-contained — a developer can complete it using only that file
|
||||
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-28
|
||||
@@ -0,0 +1,48 @@
|
||||
## Context
|
||||
|
||||
Phase 1 MVP is complete and live on `develop`. The bedroom developer docs cover the API surface. DevOps engineers — responsible for deployment, configuration, and operations — have no documentation. This gap creates operational risk: misconfigured environment variables, missed migration steps, and no recovery path when services fail.
|
||||
|
||||
**Audience**: Engineers who deploy and operate the AgentIdP infrastructure. Assumed knowledge: Linux shell, Docker, PostgreSQL basics, Node.js process management.
|
||||
|
||||
**Constraints:**
|
||||
- Markdown only — renders on GitHub, no build step
|
||||
- All commands are exact and runnable — no placeholders
|
||||
- Honest about Phase 1 P1 gaps: Dockerfile does not exist yet; document what works now and mark pending items clearly
|
||||
- Files live in `docs/devops/` — separate from `docs/developers/`
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- DevOps engineer can stand up a working local environment from scratch using only these docs
|
||||
- Every environment variable is documented with type, requirement, and example
|
||||
- Database schema and migration procedure are fully documented
|
||||
- Security setup (JWT keys, CORS, secrets) is step-by-step
|
||||
- Operations runbook covers the most likely failure scenarios
|
||||
|
||||
**Non-Goals:**
|
||||
- Container deployment guide (Dockerfile is Phase 1 P1 — not built yet)
|
||||
- Cloud/Kubernetes deployment (Phase 2)
|
||||
- Monitoring/alerting setup (Phase 2)
|
||||
- Multi-region or HA configuration (Phase 2)
|
||||
|
||||
## Decisions
|
||||
|
||||
**Decision 1: Separate folder vs subdirectory of docs/developers/**
|
||||
Chosen: `docs/devops/` as a peer of `docs/developers/`.
|
||||
Reason: Different audiences, no shared content, prevents confusion.
|
||||
|
||||
**Decision 2: Mark Dockerfile gap explicitly**
|
||||
Chosen: `local-development.md` documents working `docker-compose` + `npm` path; `Dockerfile` noted as Phase 1 P1 pending with a placeholder section.
|
||||
Reason: Honest documentation prevents broken deployments.
|
||||
|
||||
**Decision 3: Operations and security as separate files**
|
||||
Chosen: `security.md` and `operations.md` are separate.
|
||||
Reason: DevOps engineers frequently consult these independently — security during setup, operations during incidents.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Documentation only. No code changes. No rollback needed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
*(none — scope fully defined)*
|
||||
@@ -0,0 +1,19 @@
|
||||
## Why
|
||||
|
||||
SentryAgent.ai AgentIdP Phase 1 MVP is complete and `docs/developers/` covers API consumers. However, there is no documentation for the engineers who deploy, configure, and operate the infrastructure. A DevOps engineer joining the project today has no reference for environment variables, database schema, deployment procedure, security configuration, or operational runbook. We fix that now.
|
||||
|
||||
## What Changes
|
||||
|
||||
- New `docs/devops/` folder — fully separate from `docs/developers/` — containing a complete operational reference for DevOps engineers
|
||||
- System architecture overview: components, ports, dependencies, data flow
|
||||
- Complete environment variable reference: every variable, required vs optional, format, examples
|
||||
- Database documentation: 4-table schema, migration runner, how to apply/verify migrations
|
||||
- Local development guide: docker-compose infrastructure setup, service ports, health checks
|
||||
- Security guide: RSA keypair generation and rotation, CORS config, secret storage
|
||||
- Operations runbook: startup procedure, graceful shutdown (SIGTERM/SIGINT), logging, common failures and fixes
|
||||
|
||||
## What Does Not Change
|
||||
|
||||
- `docs/developers/` — not touched
|
||||
- Source code — documentation only
|
||||
- No new dependencies
|
||||
@@ -0,0 +1,4 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Database doc exists at docs/devops/database.md
|
||||
The system SHALL provide `docs/devops/database.md` documenting the 4-table schema (agents, credentials, audit_events, token_revocations), the migration runner, and exact commands to apply and verify migrations.
|
||||
@@ -0,0 +1,4 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Local development guide exists at docs/devops/local-development.md
|
||||
The system SHALL provide `docs/devops/local-development.md` documenting the complete local setup using docker-compose for infrastructure and npm for the application server, including all service ports, health check verification, and the Dockerfile gap note.
|
||||
@@ -0,0 +1,7 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Security guide exists at docs/devops/security.md
|
||||
The system SHALL provide `docs/devops/security.md` documenting RSA keypair generation, key rotation procedure, CORS configuration, and secret storage guidance.
|
||||
|
||||
### Requirement: Operations runbook exists at docs/devops/operations.md
|
||||
The system SHALL provide `docs/devops/operations.md` covering startup procedure, graceful shutdown (SIGTERM/SIGINT), log interpretation, and troubleshooting for the most common operational failures.
|
||||
@@ -0,0 +1,10 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: System overview exists at docs/devops/README.md
|
||||
The system SHALL provide a `docs/devops/README.md` that serves as the entry point for DevOps engineers, including an index of all DevOps docs and a brief system overview.
|
||||
|
||||
### Requirement: Architecture doc exists at docs/devops/architecture.md
|
||||
The system SHALL provide `docs/devops/architecture.md` documenting all components (Express server, PostgreSQL, Redis), their roles, ports, and data flow.
|
||||
|
||||
### Requirement: Environment variable reference exists at docs/devops/environment-variables.md
|
||||
The system SHALL provide `docs/devops/environment-variables.md` documenting every environment variable with name, type, required/optional, default, and example value.
|
||||
@@ -0,0 +1,71 @@
|
||||
## 1. Folder Structure & Index
|
||||
|
||||
- [x] 1.1 Create `docs/devops/` directory
|
||||
- [x] 1.2 Create `docs/devops/README.md` — index + system overview (what AgentIdP is, what this folder covers, links to all docs)
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
- [x] 2.1 Create `docs/devops/architecture.md` — component diagram (Express, PostgreSQL, Redis) with roles and responsibilities
|
||||
- [x] 2.2 Document all service ports (app: 3000, PostgreSQL: 5432, Redis: 6379)
|
||||
- [x] 2.3 Document data flow: request → auth middleware → rate limit → controller → service → repository → PostgreSQL/Redis
|
||||
- [x] 2.4 Document Redis usage: token revocation keys, rate limit counters, monthly token counts
|
||||
- [x] 2.5 Document graceful shutdown: SIGTERM/SIGINT handling, server.close(), process.exit(0)
|
||||
|
||||
## 3. Environment Variables
|
||||
|
||||
- [x] 3.1 Create `docs/devops/environment-variables.md` — complete reference table
|
||||
- [x] 3.2 Document required vars: DATABASE_URL, REDIS_URL, JWT_PRIVATE_KEY, JWT_PUBLIC_KEY
|
||||
- [x] 3.3 Document optional vars: PORT (default 3000), NODE_ENV, CORS_ORIGIN (default *)
|
||||
- [x] 3.4 Add format notes: DATABASE_URL connection string format, REDIS_URL format, PEM key format
|
||||
- [x] 3.5 Add `.env` file example with all vars populated
|
||||
|
||||
## 4. Database
|
||||
|
||||
- [x] 4.1 Create `docs/devops/database.md` — schema overview section
|
||||
- [x] 4.2 Document `agents` table: all columns, types, constraints, indexes
|
||||
- [x] 4.3 Document `credentials` table: all columns, types, constraints, indexes, FK to agents
|
||||
- [x] 4.4 Document `audit_events` table: all columns, types, constraints, indexes, append-only design
|
||||
- [x] 4.5 Document `token_revocations` table: all columns, types, indexes, dual-store design (Redis + PG)
|
||||
- [x] 4.6 Document migration runner: how it works, commands to run, how to verify applied migrations
|
||||
- [x] 4.7 Document `schema_migrations` tracking table
|
||||
|
||||
## 5. Local Development
|
||||
|
||||
- [x] 5.1 Create `docs/devops/local-development.md` — prerequisites (Docker, Node.js 18+)
|
||||
- [x] 5.2 Document infrastructure-only docker-compose startup (postgres + redis only, not app service)
|
||||
- [x] 5.3 Document service ports and health check verification commands
|
||||
- [x] 5.4 Document migration step: exact `npm run db:migrate` command and expected output
|
||||
- [x] 5.5 Document application startup: `npm run dev` vs `npm start` (compiled), expected log output
|
||||
- [x] 5.6 Note Dockerfile gap: app service in docker-compose.yml requires Dockerfile (Phase 1 P1 pending)
|
||||
- [x] 5.7 Document full docker-compose stack startup (for when Dockerfile is available)
|
||||
- [x] 5.8 Document stopping and cleaning up: `docker-compose down` and volume removal
|
||||
|
||||
## 6. Security
|
||||
|
||||
- [x] 6.1 Create `docs/devops/security.md` — JWT key management section
|
||||
- [x] 6.2 Document RSA-2048 keypair generation using openssl (exact commands)
|
||||
- [x] 6.3 Document PEM format for env vars (newlines as \n in single-line env, or file path approach)
|
||||
- [x] 6.4 Document key rotation procedure: generate new pair, update env, restart server, old tokens expire naturally
|
||||
- [x] 6.5 Document CORS configuration: CORS_ORIGIN env var, wildcard vs specific origin
|
||||
- [x] 6.6 Document secret storage guidance: never commit .env, use secrets manager in production
|
||||
- [x] 6.7 Document bcrypt: credentials are stored as bcrypt hashes, plaintext never persisted
|
||||
|
||||
## 7. Operations
|
||||
|
||||
- [x] 7.1 Create `docs/devops/operations.md` — startup checklist
|
||||
- [x] 7.2 Document startup order: PostgreSQL → Redis → run migrations → start app
|
||||
- [x] 7.3 Document graceful shutdown: send SIGTERM, server drains in-flight requests, exits 0
|
||||
- [x] 7.4 Document log output format: what each startup log line means
|
||||
- [x] 7.5 Document troubleshooting: DATABASE_URL not set, REDIS_URL not set, JWT keys not set
|
||||
- [x] 7.6 Document troubleshooting: PostgreSQL connection refused (service not ready)
|
||||
- [x] 7.7 Document troubleshooting: Redis connection error (service not ready)
|
||||
- [x] 7.8 Document troubleshooting: migration fails (connection issue vs SQL error)
|
||||
- [x] 7.9 Document Redis key patterns used by the application (rate:, revoked:, monthly:)
|
||||
|
||||
## 8. QA & Review
|
||||
|
||||
- [x] 8.1 Verify all commands are exact and runnable (no placeholders in shell commands)
|
||||
- [x] 8.2 Verify all env var names match source code exactly
|
||||
- [x] 8.3 Verify all table/column names match migration SQL exactly
|
||||
- [x] 8.4 Verify all port numbers match docker-compose.yml
|
||||
- [x] 8.5 Verify all internal links resolve
|
||||
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-29
|
||||
105
openspec/changes/archive/2026-04-02-engineering-docs/design.md
Normal file
105
openspec/changes/archive/2026-04-02-engineering-docs/design.md
Normal file
@@ -0,0 +1,105 @@
|
||||
## Context
|
||||
|
||||
SentryAgent.ai has completed Phase 1 (MVP) and Phase 2 (Production-Ready), producing a fully implemented AgentIdP with 12 capabilities across ~150 source files, 4 language SDKs, Terraform infrastructure, and a React web dashboard. The codebase is mature but undocumented at the engineering level — there are bedroom developer guides (`docs/developers/`) and DevOps guides (`docs/devops/`), but no structured internal engineering knowledge base.
|
||||
|
||||
New hires arrive with BSc Computer Science and one year of industrial experience. They understand programming fundamentals and have worked on codebases before, but they have no context on: what SentryAgent.ai is building, why architectural decisions were made, how the codebase is structured, how to navigate the services, how to contribute per our standards, or how the OpenSpec workflow operates. Without documentation, onboarding is fragmented and relies entirely on the CTO's time.
|
||||
|
||||
The goal is a `docs/engineering/` directory that a new engineer can read sequentially from top to bottom and arrive ready to contribute within their first week.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Produce a complete top-down engineering knowledge base readable in sequence
|
||||
- Cover all 10 capability areas identified in the proposal
|
||||
- Calibrate depth for BSc + 1yr experience — assume programming competence, explain domain and architectural decisions
|
||||
- Every document is self-contained with internal cross-links where needed
|
||||
- All code examples are complete and runnable (no ellipses, no `// ... rest of code`)
|
||||
- Development environment setup is achievable in under 30 minutes following the guide alone
|
||||
- Annotated walkthroughs trace the three critical flows through every layer of code with file:line references
|
||||
|
||||
**Non-Goals:**
|
||||
- Not a replacement for `docs/developers/` (end-user API reference) or `docs/devops/` (operator runbooks)
|
||||
- Not a tutorial for learning TypeScript, React, or Terraform — assumes language competence
|
||||
- Not a complete API reference — `docs/developers/api-reference.md` already covers that
|
||||
- Not roadmap documentation — focuses on what is built, not what is planned
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1: Location — `docs/engineering/` as a flat directory with an index
|
||||
|
||||
**Decision**: All engineering docs live in `docs/engineering/` as flat markdown files with a `README.md` index.
|
||||
|
||||
**Rationale**: Deep nested directory structures create navigation friction. Flat layout with numbered filenames (`01-overview.md`, `02-architecture.md`) ensures reading order is obvious without needing a build tool. Gitea renders markdown natively, so no documentation site tooling is required.
|
||||
|
||||
**Alternatives considered**:
|
||||
- `docs/engineering/<subdirs>/` — rejected: adds navigation complexity with no benefit at our current document count
|
||||
- Docusaurus site — rejected: adds build infrastructure overhead; plain markdown in-repo is sufficient and always in sync with code
|
||||
|
||||
---
|
||||
|
||||
### D2: Numbered file naming for enforced reading order
|
||||
|
||||
**Decision**: Files are named `01-overview.md` through `10-sdk-guide.md`.
|
||||
|
||||
**Rationale**: New engineers need a guided path, not a reference library. Numbers make the intended reading sequence unambiguous without any tooling. The `README.md` index maps numbers to sections.
|
||||
|
||||
---
|
||||
|
||||
### D3: Annotated walkthroughs use file:line references
|
||||
|
||||
**Decision**: Code walkthrough documents reference actual source files with line numbers (e.g., `src/controllers/agentController.ts:45`).
|
||||
|
||||
**Rationale**: Engineers with 1yr experience learn fastest by reading real code, not simplified pseudocode. File:line references let them jump directly to the relevant section in their editor or on Gitea.
|
||||
|
||||
**Trade-off**: Line numbers drift as code changes. Mitigation: walkthrough documents include a "last verified" version comment and note which commit they were verified against. The CTO adds walkthrough review to the Phase 3 change process as a maintenance item.
|
||||
|
||||
---
|
||||
|
||||
### D4: Three walkthroughs selected by criticality and complexity
|
||||
|
||||
**Decision**: Walkthroughs cover: (1) OAuth 2.0 token issuance, (2) agent registration, (3) credential rotation.
|
||||
|
||||
**Rationale**:
|
||||
- Token issuance is the highest-traffic path and touches the most layers (controller → service → repository → Redis → JWT signing)
|
||||
- Agent registration is the entry point for all users and demonstrates the full validation + persistence + audit pattern
|
||||
- Credential rotation demonstrates the Vault integration path and shows how Phase 2 extended Phase 1 patterns
|
||||
|
||||
These three flows collectively exercise every architectural layer and every major design pattern in the codebase.
|
||||
|
||||
---
|
||||
|
||||
### D5: Service deep-dives use a consistent template
|
||||
|
||||
**Decision**: Each service deep-dive follows the structure: Purpose → Responsibility boundary → Interface → Key methods → Database schema (if applicable) → Error types → Configuration.
|
||||
|
||||
**Rationale**: Consistency reduces cognitive load. An engineer who has read the AgentService deep-dive knows exactly where to look for the same information in the OAuth2Service deep-dive. The template mirrors SOLID's Single Responsibility — each section answers one question.
|
||||
|
||||
---
|
||||
|
||||
### D6: Engineering workflow doc is prescriptive, not descriptive
|
||||
|
||||
**Decision**: The workflow guide tells engineers exactly what to do step by step, not just what the process is.
|
||||
|
||||
**Rationale**: Engineers with 1yr experience have worked in teams but may not have used a spec-first workflow before. A prescriptive guide ("Step 1: run `openspec new change <name>`") reduces ambiguity and enforces our standards from day one.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
**[Line numbers drift as code evolves]** → Walkthroughs include a "last verified against commit X" header. The CTO assigns a quarterly walkthrough review task in each Phase change.
|
||||
|
||||
**[Docs can become stale if not maintained]** → Each document has a "Last updated" field in its header. The engineering workflow guide explicitly requires updating relevant engineering docs as part of any PR that changes architecture or public service interfaces.
|
||||
|
||||
**[Scope is large — ~15 documents, ~10,000 lines]** → Tasks are broken into discrete documents, each independently completable. No document depends on another being written first (only the index depends on all others).
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. Create `docs/engineering/` directory
|
||||
2. Write all 15 documents (10 capability areas, some split across multiple files)
|
||||
3. Write `docs/engineering/README.md` index with links and reading order
|
||||
4. Commit all to `develop` in a single commit
|
||||
5. No existing documentation is modified or removed
|
||||
|
||||
No rollback required — this is additive only.
|
||||
|
||||
## Open Questions
|
||||
|
||||
_(none — all decisions made above; scope fully defined in proposal)_
|
||||
@@ -0,0 +1,42 @@
|
||||
## Why
|
||||
|
||||
SentryAgent.ai is growing and hiring engineers with BSc Computer Science and one year of industrial experience. There are currently no internal engineering documents that explain how the system works from the top down — new engineers have no structured path from product vision to running code, and no reference for how to contribute correctly. This gap slows onboarding, increases mistakes, and risks divergence from our architecture and standards.
|
||||
|
||||
## What Changes
|
||||
|
||||
- New `docs/engineering/` directory added to the repository as the canonical engineering knowledge base
|
||||
- Top-down documentation suite covering all layers of the system: company vision → architecture → codebase → services → workflows → operations
|
||||
- Annotated code walkthroughs for the three most critical system flows (token issuance, agent registration, credential rotation)
|
||||
- Development environment setup guide targeting < 30 minutes from clone to running local stack
|
||||
- Engineering workflow guide covering the full OpenSpec → Architect → Developer → QA → merge cycle
|
||||
- Service deep-dive documents for all 8 core services/components
|
||||
- SDK integration guide covering all four language SDKs
|
||||
- Testing strategy and quality gate reference
|
||||
- Deployment and operations reference covering Docker, Terraform, and monitoring
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
|
||||
- `engineering-overview`: Company mission, product vision, system purpose, and how the engineering team operates — the entry point for all new hires
|
||||
- `architecture-guide`: System architecture including component diagram, data flow diagrams, deployment topology, and technology stack rationale (ADRs)
|
||||
- `codebase-structure`: Annotated directory map explaining every top-level directory and key file, what lives where and why
|
||||
- `service-deep-dives`: Per-service documentation for AgentService, OAuth2Service, CredentialService, AuditService, VaultClient, OPA policy engine, Web Dashboard, and Prometheus/Grafana monitoring
|
||||
- `code-walkthroughs`: Step-by-step annotated traces of the three critical flows: token issuance end-to-end, agent registration end-to-end, credential rotation end-to-end
|
||||
- `dev-environment-setup`: Local development environment setup — prerequisites, clone, configure, Docker Compose up, smoke test — targeting < 30 minutes
|
||||
- `engineering-workflow`: How to contribute — OpenSpec spec-first workflow, branching strategy, PR standards, quality gates, and the role of each virtual engineering team member
|
||||
- `testing-strategy`: Test framework, test types (unit vs integration), coverage gates, how to run tests, and how to write new tests following project conventions
|
||||
- `deployment-operations`: Docker build and run, Terraform multi-region deployment, environment configuration, Prometheus/Grafana monitoring, and operational runbooks
|
||||
- `sdk-guide`: Integration guide for Node.js, Python, Go, and Java SDKs — installation, authentication, all major operations, error handling
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
_(none — this change adds documentation only; no existing spec-level behavior changes)_
|
||||
|
||||
## Impact
|
||||
|
||||
- **Repository**: New `docs/engineering/` directory (~15 documents, ~10,000 lines of markdown)
|
||||
- **No code changes**: Documentation only — zero impact on `src/`, `tests/`, `sdk/`, or infrastructure
|
||||
- **Dependencies**: None — no new packages required
|
||||
- **APIs**: No API changes
|
||||
- **Existing docs**: `docs/developers/` (bedroom developer guide) and `docs/devops/` (operations) remain unchanged; this is an additive engineering-internal knowledge base
|
||||
@@ -0,0 +1,35 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: System architecture document
|
||||
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
|
||||
|
||||
#### Scenario: Component diagram present
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
|
||||
|
||||
#### Scenario: Request lifecycle explained
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
|
||||
|
||||
#### Scenario: Data flow for authentication described
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
|
||||
|
||||
#### Scenario: Deployment topology covered
|
||||
- **WHEN** a new engineer reads 02-architecture.md
|
||||
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
|
||||
|
||||
### Requirement: Technology stack and ADR document
|
||||
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
|
||||
|
||||
#### Scenario: Every major technology documented with rationale
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
|
||||
|
||||
#### Scenario: TypeScript strict mode rationale explained
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
|
||||
|
||||
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
|
||||
- **WHEN** a new engineer reads 03-tech-stack.md
|
||||
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)
|
||||
@@ -0,0 +1,27 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Annotated code walkthrough documents
|
||||
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
|
||||
|
||||
#### Scenario: Token issuance walkthrough complete
|
||||
- **WHEN** a new engineer reads the token issuance walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number where it occurs
|
||||
|
||||
#### Scenario: Agent registration walkthrough complete
|
||||
- **WHEN** a new engineer reads the agent registration walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Credential rotation walkthrough complete
|
||||
- **WHEN** a new engineer reads the credential rotation walkthrough
|
||||
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
|
||||
- **AND** every step SHALL reference the actual file and line number
|
||||
|
||||
#### Scenario: Walkthroughs include version reference
|
||||
- **WHEN** a new engineer reads any walkthrough
|
||||
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
|
||||
|
||||
#### Scenario: Each walkthrough annotates why, not just what
|
||||
- **WHEN** a new engineer reads a walkthrough step
|
||||
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before
|
||||
@@ -0,0 +1,24 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Codebase structure document
|
||||
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
|
||||
|
||||
#### Scenario: Full directory tree annotated
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
|
||||
|
||||
#### Scenario: src/ subdirectory roles explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
|
||||
|
||||
#### Scenario: Where to add new code explained
|
||||
- **WHEN** a new engineer needs to add a new feature
|
||||
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
|
||||
|
||||
#### Scenario: Key files identified and explained
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
|
||||
|
||||
#### Scenario: DRY principle mapped to structure
|
||||
- **WHEN** a new engineer reads 04-codebase-structure.md
|
||||
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Deployment and operations guide
|
||||
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
|
||||
|
||||
#### Scenario: Docker build and run documented
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
|
||||
|
||||
#### Scenario: Environment variables fully documented
|
||||
- **WHEN** a new engineer needs to configure the application
|
||||
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
|
||||
|
||||
#### Scenario: Database migrations documented
|
||||
- **WHEN** a new engineer needs to run or write migrations
|
||||
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
|
||||
|
||||
#### Scenario: Terraform multi-region deployment explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
|
||||
|
||||
#### Scenario: Prometheus metrics and Grafana explained
|
||||
- **WHEN** a new engineer reads 10-deployment.md
|
||||
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
|
||||
|
||||
#### Scenario: Operational runbook for common tasks
|
||||
- **WHEN** a new engineer is on-call or supporting operations
|
||||
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Development environment setup guide
|
||||
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
|
||||
|
||||
#### Scenario: Prerequisites listed completely
|
||||
- **WHEN** a new engineer reads 07-dev-setup.md
|
||||
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
|
||||
|
||||
#### Scenario: Repository clone and setup steps complete
|
||||
- **WHEN** a new engineer follows the clone and setup steps
|
||||
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
|
||||
|
||||
#### Scenario: Docker Compose local stack starts successfully
|
||||
- **WHEN** a new engineer runs `docker-compose up -d`
|
||||
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
|
||||
|
||||
#### Scenario: Smoke test confirms working stack
|
||||
- **WHEN** a new engineer follows the smoke test section
|
||||
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
|
||||
|
||||
#### Scenario: Common setup errors documented
|
||||
- **WHEN** a new engineer encounters a setup error
|
||||
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
|
||||
|
||||
#### Scenario: Running tests locally documented
|
||||
- **WHEN** a new engineer wants to run the test suite
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
|
||||
|
||||
#### Scenario: Web dashboard local development documented
|
||||
- **WHEN** a new engineer wants to run the web dashboard
|
||||
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Company and product overview document
|
||||
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
|
||||
|
||||
#### Scenario: Mission and vision covered
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
|
||||
|
||||
#### Scenario: AGNTCY alignment explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
|
||||
|
||||
#### Scenario: Product features listed
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
|
||||
|
||||
#### Scenario: Phase roadmap visible
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
|
||||
|
||||
#### Scenario: Engineering team structure explained
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
|
||||
|
||||
#### Scenario: Free tier limits documented
|
||||
- **WHEN** a new engineer reads 01-overview.md
|
||||
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Engineering workflow and contribution guide
|
||||
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
|
||||
|
||||
#### Scenario: OpenSpec spec-first workflow explained
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
|
||||
|
||||
#### Scenario: OpenSpec CLI commands documented
|
||||
- **WHEN** a new engineer wants to start a new change
|
||||
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
|
||||
|
||||
#### Scenario: Branching strategy documented
|
||||
- **WHEN** a new engineer creates a branch
|
||||
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop` → `main` requires CTO + CEO approval
|
||||
|
||||
#### Scenario: TypeScript and code standards enforced in workflow
|
||||
- **WHEN** a new engineer writes code
|
||||
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
|
||||
|
||||
#### Scenario: PR checklist documented
|
||||
- **WHEN** a new engineer opens a PR
|
||||
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
|
||||
|
||||
#### Scenario: Virtual engineering team roles explained for contributors
|
||||
- **WHEN** a new engineer reads 08-workflow.md
|
||||
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
|
||||
|
||||
#### Scenario: Commit message conventions documented
|
||||
- **WHEN** a new engineer writes a commit message
|
||||
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each
|
||||
@@ -0,0 +1,28 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: SDK integration guide
|
||||
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
|
||||
|
||||
#### Scenario: SDK architecture overview present
|
||||
- **WHEN** a new engineer reads 11-sdk-guide.md
|
||||
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
|
||||
|
||||
#### Scenario: Node.js SDK documented
|
||||
- **WHEN** a new engineer reads the Node.js SDK section
|
||||
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
|
||||
|
||||
#### Scenario: Python SDK documented
|
||||
- **WHEN** a new engineer reads the Python SDK section
|
||||
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
|
||||
|
||||
#### Scenario: Go SDK documented
|
||||
- **WHEN** a new engineer reads the Go SDK section
|
||||
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
|
||||
|
||||
#### Scenario: Java SDK documented
|
||||
- **WHEN** a new engineer reads the Java SDK section
|
||||
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
|
||||
|
||||
#### Scenario: SDK contribution guide included
|
||||
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
|
||||
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language
|
||||
@@ -0,0 +1,40 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Service deep-dive documents
|
||||
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
|
||||
|
||||
#### Scenario: AgentService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
|
||||
|
||||
#### Scenario: OAuth2Service documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
|
||||
|
||||
#### Scenario: CredentialService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
|
||||
|
||||
#### Scenario: AuditService documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
|
||||
|
||||
#### Scenario: VaultClient documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
|
||||
|
||||
#### Scenario: OPA policy engine documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
|
||||
|
||||
#### Scenario: Web Dashboard documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
|
||||
|
||||
#### Scenario: Monitoring stack documented
|
||||
- **WHEN** a new engineer reads 05-services.md
|
||||
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
|
||||
|
||||
#### Scenario: Consistent template enforced
|
||||
- **WHEN** a new engineer looks up any service
|
||||
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information
|
||||
@@ -0,0 +1,32 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Testing strategy document
|
||||
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
|
||||
|
||||
#### Scenario: Test types and their purposes explained
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
|
||||
|
||||
#### Scenario: Test framework stack documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
|
||||
|
||||
#### Scenario: Coverage gates documented
|
||||
- **WHEN** a new engineer reads 09-testing.md
|
||||
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
|
||||
|
||||
#### Scenario: How to run the test suite documented
|
||||
- **WHEN** a new engineer wants to run tests
|
||||
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
|
||||
|
||||
#### Scenario: Unit test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new unit test
|
||||
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
|
||||
|
||||
#### Scenario: Integration test writing conventions shown
|
||||
- **WHEN** a new engineer writes a new integration test
|
||||
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
|
||||
|
||||
#### Scenario: OWASP security testing reference included
|
||||
- **WHEN** a new engineer writes security-relevant code
|
||||
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)
|
||||
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-29
|
||||
269
openspec/changes/archive/2026-04-02-phase-3-enterprise/design.md
Normal file
269
openspec/changes/archive/2026-04-02-phase-3-enterprise/design.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Phase 3: Enterprise — Technical Design
|
||||
|
||||
**Date**: 2026-03-29
|
||||
**Author**: Virtual Architect
|
||||
**Status**: Draft — pending CEO approval of proposal
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Phase 3 transforms AgentIdP from a single-tenant OAuth 2.0 server into a multi-tenant, W3C DID-issuing, OIDC-compliant, federated enterprise identity platform. The architecture remains monolithic Express (no microservices split) to avoid operational complexity, but clear service boundaries are enforced internally.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ AgentIdP Server (Express) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────┐ │
|
||||
│ │ Middleware Stack (ordered) │ │
|
||||
│ │ TLS Enforcement → Auth → Org Context → OPA │ │
|
||||
│ └────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────┐ │
|
||||
│ │ OrgSvc │ │ DIDSvc │ │OIDCSvc │ │FedSvc │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └───────┘ │
|
||||
│ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ WebhookQ │ │ SOC2Ctrl │ │
|
||||
│ └──────────┘ └──────────┘ │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
┌────────▼──┐ ┌─────▼───┐ ┌──▼──────────┐
|
||||
│PostgreSQL │ │ Redis │ │ Vault │
|
||||
│(org rows) │ │(webhook │ │ (secrets) │
|
||||
└───────────┘ │ queue) │ └─────────────┘
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architectural Decision Records
|
||||
|
||||
---
|
||||
|
||||
### D1: Multi-Tenancy Model
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Row-level tenancy — add `organization_id` (UUID, NOT NULL) to every domain table. No schema-per-tenant, no database-per-tenant.
|
||||
|
||||
**Rationale**: Row-level tenancy is operationally the simplest approach: a single database, a single schema, a single connection pool. All queries are augmented with an `organization_id` filter extracted from the authenticated JWT. PostgreSQL Row-Level Security (RLS) is enabled on all tenant-scoped tables as a defense-in-depth measure — even if the application filter is accidentally omitted, the database enforces isolation.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Schema-per-tenant | Strong isolation, independent migrations | Complex migration tooling, connection pool explosion at scale | Operational overhead exceeds threat model requirement |
|
||||
| Database-per-tenant | Maximum isolation | Separate connection pool, backup, monitoring per tenant | Prohibitive at 100+ orgs; overkill for our threat model |
|
||||
| Row-level (chosen) | Simple, fast, single migration path | RLS must be enforced consistently | Chosen — enforce via both application and RLS |
|
||||
|
||||
**Consequences**:
|
||||
- Every domain table gets an `organization_id` column and a corresponding index
|
||||
- All service methods accept `organizationId: string` as a required parameter
|
||||
- JWT payload extended to include `organization_id` claim
|
||||
- Existing single-tenant data migrated to a default `system` organization
|
||||
- PostgreSQL RLS policies written for all tenant tables
|
||||
|
||||
---
|
||||
|
||||
### D2: DID Method Selection
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: `did:web` — DID Documents served over HTTPS at well-known and per-agent URLs.
|
||||
|
||||
**Rationale**: `did:web` requires no blockchain, no ledger, and no external infrastructure beyond the HTTPS server already running. It is W3C DID Core 1.0 compliant, supported by all major DID resolvers, and is the preferred method for enterprise deployments where an organization controls its own domain. It aligns directly with the `did:web` identifier scheme used in AGNTCY agent card specifications.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| `did:web` (chosen) | No blockchain, HTTPS-based, enterprise-friendly | DID tied to domain; moving the domain invalidates DIDs | Accepted tradeoff — enterprise deployments have stable domains |
|
||||
| `did:key` | Self-contained, no infrastructure | Not anchored — anyone can generate any `did:key`; no discovery | No trust anchor; not suitable for enterprise identity |
|
||||
| `did:ethr` | Ethereum-anchored, decentralized | Blockchain dependency, gas costs, not enterprise-standard | Blockchain dependency is a non-starter for regulated enterprises |
|
||||
|
||||
**Consequences**:
|
||||
- DID for the AgentIdP instance: `did:web:<hostname>`
|
||||
- DID for an agent: `did:web:<hostname>:agents:<agentId>`
|
||||
- DID Documents served at `/.well-known/did.json` and `/agents/:id/did`
|
||||
- Domain change requires DID migration — document this in ops runbook
|
||||
|
||||
---
|
||||
|
||||
### D3: OIDC Library Selection
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: `oidc-provider` npm package — a certified, RFC-compliant OIDC server library.
|
||||
|
||||
**Rationale**: `oidc-provider` is the most widely deployed Node.js OIDC library, passing the OpenID Foundation's official conformance test suite. Building OIDC from scratch on top of our existing JWT infrastructure would require implementing Discovery, JWKS rotation, ID token construction, and claim aggregation correctly against multiple RFCs. The certified library eliminates that risk and reduces implementation surface area. It integrates cleanly with Express as a mounted middleware.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| `oidc-provider` (chosen) | Certified, RFC-complete, actively maintained | Adds a significant dependency | Risk of non-compliance from custom implementation outweighs dependency cost |
|
||||
| Custom JWT extension | Full control, no new dependency | High risk of spec deviation; ID token, Discovery, JWKS are complex | RFC compliance cannot be self-certified |
|
||||
| `keycloak` sidecar | Battle-tested, full-featured | Heavyweight Java service; architectural mismatch | Not Node.js; adds operational complexity |
|
||||
|
||||
**Consequences**:
|
||||
- `oidc-provider` is mounted at `/oidc` in Express
|
||||
- OIDC Discovery served at `/.well-known/openid-configuration` (proxied from oidc-provider)
|
||||
- JWKS served at `/.well-known/jwks.json`
|
||||
- Adapter written to store OIDC sessions in Redis (oidc-provider's adapter interface)
|
||||
- Existing `POST /oauth2/token` route extended, not replaced — maintains backward compatibility
|
||||
|
||||
---
|
||||
|
||||
### D4: Federation Protocol
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Signed JWT assertions — remote AgentIdP instances present a signed JWT; the receiving instance verifies the signature against the registered JWKS of the issuing instance.
|
||||
|
||||
**Rationale**: JWT assertion federation reuses the existing JWT infrastructure (`jsonwebtoken`, JWKS endpoint from OIDC workstream). No new protocol is introduced. The trust model is explicit: operators register partner instances with their JWKS URL. This aligns with RFC 7523 (JWT Profile for OAuth 2.0 Client Authentication) and the AGNTCY inter-agent trust model.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Signed JWT assertions (chosen) | Uses existing JWT infra, explicit trust registry, RFC-aligned | JWKS URL must be reachable at verification time | Acceptable operational constraint; JWKS can be cached |
|
||||
| mTLS | Strong cryptographic identity | Certificate management overhead, PKI required per partner | Cert management complexity not justified when JWT assertions suffice |
|
||||
| AGNTCY-specific protocol | Native alignment | Spec still evolving; risk of churn | Build on stable JWT base; adapt to AGNTCY extensions as spec matures |
|
||||
|
||||
**Consequences**:
|
||||
- New `federation_partners` table: `id`, `name`, `jwks_url`, `issuer`, `trusted_since`, `organization_id`
|
||||
- JWKS of partner instances cached in Redis with TTL
|
||||
- `POST /federation/verify` accepts a bearer token from a remote instance and returns verification result
|
||||
- Federation tokens are not accepted for agent management endpoints — only for identity assertion
|
||||
|
||||
---
|
||||
|
||||
### D5: Webhook Delivery Architecture
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Async delivery via Redis-backed `bull` queue with exponential backoff retry (max 10 attempts over 24 hours).
|
||||
|
||||
**Rationale**: Synchronous webhook delivery from within a request handler would add latency and create tight coupling between event generation and delivery outcome. The Redis queue (`bull`) decouples delivery: events are enqueued immediately, a background worker delivers them. `bull` provides built-in retry, delay, and failure tracking without introducing a new infrastructure component (Redis is already present). HMAC-SHA256 signing on every delivery allows recipients to verify authenticity.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Redis queue via `bull` (chosen) | Reuses existing Redis, retry built-in, low operational overhead | Delivery tied to Redis availability | Acceptable — Redis is already a required dependency |
|
||||
| Synchronous in-request delivery | Simplest implementation | Adds latency to event-generating requests; failure blocks response | Unacceptable latency and coupling |
|
||||
| Dedicated message broker (RabbitMQ) | Robust, durable | New infrastructure dependency | Operational overhead; Redis already present |
|
||||
| Kafka (primary) | High-throughput, durable | Overkill for webhook delivery; complex operations | Optional adapter only; not primary delivery mechanism |
|
||||
|
||||
**Consequences**:
|
||||
- New `webhook_subscriptions` and `webhook_deliveries` tables
|
||||
- `bull` worker process runs in same Node.js instance (separate worker thread via `bull`)
|
||||
- Retry schedule: 1m, 5m, 15m, 1h, 4h, 12h, 24h (exponential backoff)
|
||||
- Failed delivery after 10 attempts moves to dead-letter; operator alerted
|
||||
- Optional Kafka adapter: if `KAFKA_BROKERS` env var is set, events are also produced to Kafka
|
||||
|
||||
---
|
||||
|
||||
### D6: SOC 2 Scope
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Target SOC 2 Type II (operational, not just design). All controls implemented in code. Audit period: 6 months post-Phase 3 launch.
|
||||
|
||||
**Rationale**: SOC 2 Type I certifies that controls are designed correctly. SOC 2 Type II certifies that they operate continuously over a period of time. Enterprise customers in regulated industries (finance, healthcare, government) require Type II. Implementing the controls now, with the 6-month operational window beginning at Phase 3 launch, puts us on the fastest possible path to Type II certification.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Type II from launch (chosen) | Satisfies enterprise requirements | Requires 6-month operation window | Accepted — the controls are implemented in Phase 3; audit window starts after launch |
|
||||
| Type I only | Faster to certify | Not accepted by most enterprise procurement | Insufficient for target customers |
|
||||
| ISO 27001 instead | International standard | Larger scope, longer implementation | SOC 2 is standard for US market; add ISO 27001 in Phase 4 |
|
||||
|
||||
**Consequences**:
|
||||
- Encryption at rest: `pgcrypto` extension for column-level encryption on `credentials.secret_hash` and `credentials.vault_path`
|
||||
- TLS enforcement: Express middleware rejects HTTP requests (not HTTPS) in production
|
||||
- Secrets rotation: cron-based job that triggers credential rotation reminders and Vault lease renewals
|
||||
- Security alerting: Prometheus alerting rules for auth failure spikes, rate limit exhaustion, anomalous token issuance
|
||||
- Audit log immutability: Merkle hash chain (each row's hash includes the previous row's hash)
|
||||
|
||||
---
|
||||
|
||||
### D7: Audit Log Immutability — Merkle Hash Chain
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Each `audit_logs` row carries a `hash` field: `SHA-256(eventId + timestamp + action + outcome + agentId + previousHash)`. The chain starts with a genesis hash. Verification is a sequential pass over all rows in insertion order.
|
||||
|
||||
**Rationale**: Append-only logs in PostgreSQL can be altered by a DBA with sufficient access. A Merkle-style hash chain makes tampering detectable without requiring a blockchain. Any modification to a historical row breaks the chain from that point forward. Verification is a simple sequential computation that can be run on demand or as a scheduled integrity check.
|
||||
|
||||
**Alternatives Considered**:
|
||||
|
||||
| Option | Pros | Cons | Rejected because |
|
||||
|--------|------|------|-----------------|
|
||||
| Merkle hash chain in PostgreSQL (chosen) | No new infra, tamper-evident, verifiable | DBA can re-compute hashes after tampering if they control the algorithm | Acceptable — threat model is accidental/low-sophistication modification; cryptographic chain deters opportunistic tampering |
|
||||
| Blockchain anchor | Cryptographically immutable | Blockchain dependency, cost, latency | Excessive for current threat model |
|
||||
| Write-once S3/GCS export | External immutability | Delayed; operational complexity | Added complexity; hash chain provides continuous coverage |
|
||||
|
||||
**Consequences**:
|
||||
- New `hash` (VARCHAR 64) and `previous_hash` (VARCHAR 64) columns on `audit_logs`
|
||||
- `AuditService.create()` computes hash before insert — adds ~1ms latency per audit event
|
||||
- New `GET /audit/verify` endpoint: returns chain integrity status (admin only)
|
||||
- `audit_logs` table has an `INSERT`-only trigger that prevents `UPDATE` and `DELETE` via PostgreSQL trigger
|
||||
|
||||
---
|
||||
|
||||
### D8: Organization Context in JWT
|
||||
|
||||
**Status**: Accepted
|
||||
|
||||
**Decision**: Add `organization_id` claim to JWT access tokens issued by `POST /oauth2/token`. All downstream middleware extracts `organization_id` from the token — no separate lookup required.
|
||||
|
||||
**Rationale**: Including `organization_id` in the JWT keeps the middleware stack stateless. The alternative — looking up the organization from the database on every request — adds latency and a database round-trip to every authenticated call. The JWT is already signed; adding a claim costs nothing cryptographically.
|
||||
|
||||
**Consequences**:
|
||||
- `ITokenPayload` interface extended: `organization_id: string`
|
||||
- All service methods receive `organizationId` from `req.user.organization_id`
|
||||
- Token introspection response includes `organization_id`
|
||||
- Agents registered before multi-tenancy belong to the default `system` organization
|
||||
|
||||
---
|
||||
|
||||
## Component Interaction Map (Phase 3)
|
||||
|
||||
```
|
||||
┌──────────────────────┐
|
||||
│ Web Dashboard │
|
||||
│ (+ Org Mgmt pages) │
|
||||
└──────────┬───────────┘
|
||||
│ HTTPS
|
||||
┌───────────────────────▼─────────────────────────────┐
|
||||
│ AgentIdP Server │
|
||||
│ │
|
||||
│ TLS MW → Auth MW → OrgContext MW → OPA MW │
|
||||
│ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
|
||||
│ │ OrgService│ │DIDService │ │ OIDCProvider │ │
|
||||
│ └───────────┘ └───────────┘ │ (oidc-provider) │ │
|
||||
│ ┌───────────┐ ┌───────────┐ └───────────────────┘ │
|
||||
│ │ FedService│ │WebhookSvc │ │
|
||||
│ └───────────┘ └───────────┘ │
|
||||
│ ┌─────────────────────────┐ │
|
||||
│ │ SOC2Controls (cross-cut)│ │
|
||||
│ └─────────────────────────┘ │
|
||||
└──────────┬──────────────┬──────────────┬─────────────┘
|
||||
│ │ │
|
||||
┌────────▼──┐ ┌───────▼──┐ ┌──────▼──────┐
|
||||
│PostgreSQL │ │ Redis │ │ Vault │
|
||||
│ + RLS │ │ +bull Q │ │ (secrets) │
|
||||
└───────────┘ └──────────┘ └─────────────┘
|
||||
│
|
||||
┌────────▼──────┐
|
||||
│ Prometheus │
|
||||
│ + Alerting │
|
||||
└────────┬──────┘
|
||||
│
|
||||
┌────────▼──────┐
|
||||
│ Grafana │
|
||||
└───────────────┘
|
||||
```
|
||||
@@ -0,0 +1,165 @@
|
||||
# Phase 3: Enterprise — Change Proposal
|
||||
|
||||
**Date**: 2026-03-29
|
||||
**Author**: Virtual Architect
|
||||
**Status**: Proposed — awaiting CEO approval
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 made it production-ready: Vault-backed secrets, multi-language SDKs, OPA policy engine, React dashboard, Prometheus/Grafana observability, and multi-region Terraform deployment. Phase 3 makes AgentIdP enterprise-grade: the platform moves from a single-tenant developer tool to a multi-tenant enterprise identity platform with W3C DID support, OIDC compliance, AGNTCY federation, real-time event streaming, and SOC 2 Type II controls.
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Phase 1 and Phase 2 are functional and production-ready but have the following enterprise gaps:
|
||||
|
||||
| Gap | Risk |
|
||||
|-----|------|
|
||||
| Single-tenant architecture | Cannot serve enterprise customers with isolated data requirements |
|
||||
| No W3C DID support | Not fully AGNTCY-compliant; agents lack interoperable decentralized identifiers |
|
||||
| OAuth 2.0 only, no OIDC | Cannot integrate with standard enterprise identity ecosystems (SSO, SCIM) |
|
||||
| No cross-instance federation | Multi-organization agent identity cannot be verified across AgentIdP deployments |
|
||||
| No webhook/event streaming | Operators cannot react to agent lifecycle events in real time |
|
||||
| No SOC 2 controls | Cannot pass enterprise security reviews; blocks revenue from regulated industries |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Changes
|
||||
|
||||
### 1. Multi-Tenancy
|
||||
Introduce an Organization model so a single AgentIdP instance can serve multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit log, and rate limits. A new Admin API provides organization lifecycle management. All existing agent, credential, and audit endpoints become organization-scoped.
|
||||
|
||||
### 2. W3C Decentralized Identifiers (DIDs)
|
||||
Issue a W3C `did:web` identifier for every registered agent. Serve DID Documents at `/.well-known/did.json` (instance root) and `/agents/:id/did` (per-agent). Expose a DID resolution endpoint. Produce AGNTCY-format agent cards from DID Documents.
|
||||
|
||||
### 3. AGNTCY Federation
|
||||
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted federation partners. Tokens issued by a trusted remote AgentIdP instance can be verified locally, enabling multi-organization and cross-enterprise agent identity interoperability aligned with AGNTCY standards.
|
||||
|
||||
### 4. OpenID Connect (OIDC)
|
||||
Add a full OIDC layer on top of the existing OAuth 2.0 implementation using the `oidc-provider` certified library. Exposes OIDC Discovery, JWKS, ID tokens with agent claims, and an `/agent-info` endpoint (the agent-identity equivalent of the OIDC `/userinfo` endpoint).
|
||||
|
||||
### 5. Webhooks and Event Streaming
|
||||
Real-time event notifications for all agent lifecycle events: agent created, suspended, revoked, credential rotated, token issued. Operators create webhook subscriptions with HMAC-SHA256 signing. Delivery is async via a Redis-backed queue with exponential backoff retry. An optional Kafka/NATS adapter is available for high-throughput environments.
|
||||
|
||||
### 6. SOC 2 Type II Preparation
|
||||
Implement the technical controls required for SOC 2 Type II audit: encryption at rest via PostgreSQL column-level encryption for secrets, TLS enforcement on all inbound connections, automated secrets rotation, security event alerting via Prometheus alerting rules, and audit log immutability proof using a Merkle hash chain appended to each `audit_logs` row.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope for Phase 3
|
||||
|
||||
- Rust/C++ SDKs (Phase 4)
|
||||
- Azure Terraform module (Phase 4)
|
||||
- SCIM provisioning (Phase 4)
|
||||
- End-user (human operator) identity management (out of product scope — AgentIdP is agent-first)
|
||||
|
||||
---
|
||||
|
||||
## Capabilities Table
|
||||
|
||||
### New Capabilities
|
||||
|
||||
| Workstream | Capability | Type |
|
||||
|-----------|-----------|------|
|
||||
| Multi-Tenancy | Organization model with isolated agent namespaces | New |
|
||||
| Multi-Tenancy | Admin API: create, list, update, delete organizations | New |
|
||||
| Multi-Tenancy | Per-organization rate limits and audit logs | New |
|
||||
| Multi-Tenancy | Organization member management | New |
|
||||
| W3C DIDs | `did:web` identifier on every registered agent | New |
|
||||
| W3C DIDs | DID Document endpoint per agent | New |
|
||||
| W3C DIDs | Instance-level root DID Document | New |
|
||||
| W3C DIDs | DID resolution endpoint | New |
|
||||
| W3C DIDs | AGNTCY-format agent card from DID Document | New |
|
||||
| OIDC | OIDC Discovery endpoint (`/.well-known/openid-configuration`) | New |
|
||||
| OIDC | JWKS endpoint (`/.well-known/jwks.json`) | New |
|
||||
| OIDC | ID token with agent claims in token response | Modified |
|
||||
| OIDC | `/agent-info` endpoint (agent claims) | New |
|
||||
| Federation | Trust registry: register and list federation partners | New |
|
||||
| Federation | Cross-instance token verification endpoint | New |
|
||||
| Federation | Signed JWT assertion inter-IdP protocol | New |
|
||||
| Webhooks | Webhook subscription management (CRUD) | New |
|
||||
| Webhooks | HMAC-SHA256 signed delivery with retry | New |
|
||||
| Webhooks | Delivery history log | New |
|
||||
| Webhooks | Kafka/NATS adapter (optional) | New |
|
||||
| SOC 2 | PostgreSQL column-level encryption for secrets at rest | New |
|
||||
| SOC 2 | TLS enforcement middleware (reject non-TLS) | New |
|
||||
| SOC 2 | Automated secrets rotation schedule | New |
|
||||
| SOC 2 | Security event alerting (Prometheus alerting rules) | New |
|
||||
| SOC 2 | Merkle hash chain on `audit_logs` for immutability proof | New |
|
||||
| SOC 2 | Compliance documentation (controls matrix, runbook) | New |
|
||||
|
||||
### Modified Capabilities
|
||||
|
||||
| Workstream | Capability | Change |
|
||||
|-----------|-----------|--------|
|
||||
| Multi-Tenancy | `POST /agents` | Now scoped to `organizationId` |
|
||||
| Multi-Tenancy | `GET /agents` | Filters restricted to caller's organization |
|
||||
| Multi-Tenancy | `GET /audit` | Restricted to caller's organization by default |
|
||||
| Multi-Tenancy | Rate limiting | Per-organization limits in addition to global |
|
||||
| OIDC | `POST /oauth2/token` | Returns `id_token` in addition to `access_token` |
|
||||
| SOC 2 | Audit log write path | Computes and appends Merkle hash on insert |
|
||||
|
||||
---
|
||||
|
||||
## Repository Impact
|
||||
|
||||
| Area | Impact |
|
||||
|------|--------|
|
||||
| `src/` | New services: OrgService, DIDService, OIDCService, FederationService, WebhookService, SOC2Controls |
|
||||
| `src/db/migrations/` | 8–10 new migration files |
|
||||
| `src/types/index.ts` | ~80 new interfaces/types |
|
||||
| `src/middleware/` | New TLS enforcement middleware, updated auth middleware for org context |
|
||||
| `src/routes/` | 6 new route files |
|
||||
| `/.well-known/` | 3 new well-known endpoints |
|
||||
| `policies/` | Updated Rego policies for org-scoped permissions |
|
||||
| `dashboard/` | New Organization management pages |
|
||||
| `monitoring/` | New alerting rules for SOC 2 security events |
|
||||
| `docs/` | Compliance documentation, federation setup guide, webhook integration guide |
|
||||
|
||||
---
|
||||
|
||||
## New Dependencies
|
||||
|
||||
| Workstream | Package | Purpose | CEO Approval Required |
|
||||
|-----------|---------|---------|----------------------|
|
||||
| Multi-Tenancy | No new packages — row-level tenancy in existing PostgreSQL | — | No |
|
||||
| W3C DIDs | `did-resolver` | W3C DID resolution | Yes |
|
||||
| W3C DIDs | `web-did-resolver` | DID:WEB method resolver | Yes |
|
||||
| OIDC | `oidc-provider` | Certified OIDC server library | Yes |
|
||||
| Federation | No new packages — signed JWT assertions use existing `jsonwebtoken` | — | No |
|
||||
| Webhooks | `bull` (Redis-backed queue) | Async webhook delivery queue | Yes |
|
||||
| Webhooks | `kafkajs` (optional, Kafka adapter) | Kafka event streaming | Yes |
|
||||
| SOC 2 | `node-forge` | Column-level encryption primitives | Yes |
|
||||
|
||||
---
|
||||
|
||||
## Delivery Sequence
|
||||
|
||||
Multi-tenancy is a prerequisite for all enterprise customer work — it must land first. DID support and OIDC are independent and can proceed in parallel. Federation depends on DIDs being in place. Webhooks are standalone. SOC 2 controls cut across the entire codebase and are implemented last to ensure all features they protect are already present.
|
||||
|
||||
```
|
||||
1. Multi-Tenancy (prerequisite — all enterprise features assume org context)
|
||||
2. W3C DIDs (parallel)
|
||||
OIDC (parallel)
|
||||
3. Federation (depends on DIDs)
|
||||
4. Webhooks (standalone)
|
||||
5. SOC 2 (cuts across all workstreams — implemented after all features are stable)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- All new dependencies CEO-approved before implementation begins
|
||||
- All new API endpoints have OpenAPI 3.0 specs before implementation
|
||||
- Multi-tenancy isolation verified: no cross-organization data leakage
|
||||
- DID Documents are W3C DID Core 1.0 compliant and resolve correctly
|
||||
- OIDC Discovery passes `oidc-provider` conformance test suite
|
||||
- Federation token verification rejects tampered assertions
|
||||
- Webhook delivery achieves >99.9% success rate with retry logic
|
||||
- SOC 2 controls pass independent technical review
|
||||
- TypeScript strict mode + zero `any` maintained throughout
|
||||
- >80% test coverage on all new services
|
||||
@@ -0,0 +1,370 @@
|
||||
# AGNTCY Federation — Specification
|
||||
|
||||
**Workstream**: 4 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
|
||||
|
||||
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /federation/trust
|
||||
|
||||
Register a new federation trust partner. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
POST /federation/trust
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [name, issuer, jwksUri]
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
description: Human-readable name for this federation partner
|
||||
example: "Contoso AgentIdP"
|
||||
issuer:
|
||||
type: string
|
||||
format: uri
|
||||
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
|
||||
example: "https://agentidp.contoso.com"
|
||||
jwksUri:
|
||||
type: string
|
||||
format: uri
|
||||
description: URL of the partner's JWKS endpoint
|
||||
example: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
allowedOrganizations:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
|
||||
example: ["org_contoso_engineering"]
|
||||
expiresAt:
|
||||
type: string
|
||||
format: date-time
|
||||
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/FederationPartner'
|
||||
example:
|
||||
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
status: "active"
|
||||
allowedOrganizations: []
|
||||
trustedSince: "2026-03-29T12:00:00Z"
|
||||
expiresAt: null
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
examples:
|
||||
duplicate_issuer:
|
||||
code: "DUPLICATE_ISSUER"
|
||||
message: "A trust relationship with this issuer already exists"
|
||||
unreachable_jwks:
|
||||
code: "JWKS_UNREACHABLE"
|
||||
message: "Could not fetch JWKS from the provided jwksUri"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /federation/partners
|
||||
|
||||
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
GET /federation/partners
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended, expired]
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 20
|
||||
maximum: 100
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/FederationPartner'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
|
||||
status: "active"
|
||||
trustedSince: "2026-03-29T12:00:00Z"
|
||||
expiresAt: null
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 20
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /federation/partners/:partnerId
|
||||
|
||||
Remove a federation trust relationship. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
DELETE /federation/partners/{partnerId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Path Parameters:
|
||||
partnerId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /federation/verify
|
||||
|
||||
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
|
||||
|
||||
```yaml
|
||||
POST /federation/verify
|
||||
Authorization: Bearer <local access_token with agents:read scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [token]
|
||||
properties:
|
||||
token:
|
||||
type: string
|
||||
description: The JWT token issued by the remote AgentIdP instance to verify
|
||||
expectedIssuer:
|
||||
type: string
|
||||
format: uri
|
||||
description: Optional — if provided, verification fails if token issuer does not match
|
||||
expectedOrganizationId:
|
||||
type: string
|
||||
description: Optional — if provided, verification fails if token organization_id does not match
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
claims:
|
||||
type: object
|
||||
description: Decoded JWT claims from the verified token
|
||||
properties:
|
||||
sub:
|
||||
type: string
|
||||
iss:
|
||||
type: string
|
||||
iat:
|
||||
type: integer
|
||||
exp:
|
||||
type: integer
|
||||
agent_id:
|
||||
type: string
|
||||
agent_type:
|
||||
type: string
|
||||
organization_id:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
did:
|
||||
type: string
|
||||
partner:
|
||||
type: object
|
||||
description: The federation partner record that vouches for this token
|
||||
properties:
|
||||
partnerId:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
issuer:
|
||||
type: string
|
||||
example:
|
||||
valid: true
|
||||
claims:
|
||||
sub: "agt_contoso_abc123"
|
||||
iss: "https://agentidp.contoso.com"
|
||||
iat: 1743249600
|
||||
exp: 1743253200
|
||||
agent_id: "agt_contoso_abc123"
|
||||
agent_type: "classifier"
|
||||
organization_id: "org_contoso_engineering"
|
||||
capabilities: ["text-classification"]
|
||||
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
|
||||
partner:
|
||||
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
|
||||
name: "Contoso AgentIdP"
|
||||
issuer: "https://agentidp.contoso.com"
|
||||
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
|
||||
401 Unauthorized (local token invalid):
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
|
||||
422 Unprocessable Entity (token invalid or untrusted issuer):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
example: false
|
||||
reason:
|
||||
type: string
|
||||
enum:
|
||||
- TOKEN_EXPIRED
|
||||
- INVALID_SIGNATURE
|
||||
- UNTRUSTED_ISSUER
|
||||
- JWKS_FETCH_FAILED
|
||||
- ORGANIZATION_NOT_ALLOWED
|
||||
message:
|
||||
type: string
|
||||
example:
|
||||
valid: false
|
||||
reason: "UNTRUSTED_ISSUER"
|
||||
message: "No trust relationship registered for issuer https://unknown.example.com"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: federation_partners
|
||||
|
||||
```sql
|
||||
CREATE TABLE federation_partners (
|
||||
partner_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
name VARCHAR(100) NOT NULL,
|
||||
issuer VARCHAR(255) NOT NULL,
|
||||
jwks_uri VARCHAR(255) NOT NULL,
|
||||
allowed_organizations JSONB NOT NULL DEFAULT '[]',
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'active',
|
||||
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
expires_at TIMESTAMPTZ,
|
||||
last_jwks_fetch TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
|
||||
UNIQUE (organization_id, issuer)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
|
||||
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
|
||||
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
|
||||
```
|
||||
|
||||
### Redis: JWKS Cache
|
||||
|
||||
Partner JWKS documents are cached in Redis with a TTL:
|
||||
|
||||
```
|
||||
Key: federation:jwks:<issuer_url_sha256>
|
||||
Value: JSON string of the JWKS document
|
||||
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
|
||||
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
|
||||
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
|
||||
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
|
||||
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
|
||||
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
|
||||
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
|
||||
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
|
||||
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
|
||||
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
|
||||
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
|
||||
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
|
||||
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
|
||||
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
|
||||
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService
|
||||
@@ -0,0 +1,444 @@
|
||||
# Multi-Tenancy — Specification
|
||||
|
||||
**Workstream**: 1 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
|
||||
|
||||
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /organizations
|
||||
|
||||
Create a new organization. Requires system-admin scope (`admin:orgs`).
|
||||
|
||||
```yaml
|
||||
POST /organizations
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [name, slug]
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
description: Display name of the organization
|
||||
example: "Acme AI Platform"
|
||||
slug:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 50
|
||||
pattern: "^[a-z0-9-]+$"
|
||||
description: URL-safe unique identifier
|
||||
example: "acme-ai"
|
||||
planTier:
|
||||
type: string
|
||||
enum: [free, pro, enterprise]
|
||||
default: free
|
||||
maxAgents:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 100
|
||||
maxTokensPerMonth:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 10000
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
example:
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
name: "Acme AI Platform"
|
||||
slug: "acme-ai"
|
||||
planTier: "free"
|
||||
maxAgents: 100
|
||||
maxTokensPerMonth: 10000
|
||||
status: "active"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "slug must be unique"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "INSUFFICIENT_SCOPE"
|
||||
message: "admin:orgs scope required"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /organizations
|
||||
|
||||
List all organizations. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
GET /organizations
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended, deleted]
|
||||
page:
|
||||
type: integer
|
||||
minimum: 1
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
minimum: 1
|
||||
maximum: 100
|
||||
default: 20
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
name: "Acme AI Platform"
|
||||
slug: "acme-ai"
|
||||
planTier: "free"
|
||||
status: "active"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 20
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /organizations/:orgId
|
||||
|
||||
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
|
||||
|
||||
```yaml
|
||||
GET /organizations/{orgId}
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Path Parameters:
|
||||
orgId:
|
||||
type: string
|
||||
description: Organization ID (org_... prefix)
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ORG_NOT_FOUND"
|
||||
message: "Organization not found"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PATCH /organizations/:orgId
|
||||
|
||||
Partially update an organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
PATCH /organizations/{orgId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
name:
|
||||
type: string
|
||||
minLength: 2
|
||||
maxLength: 100
|
||||
planTier:
|
||||
type: string
|
||||
enum: [free, pro, enterprise]
|
||||
maxAgents:
|
||||
type: integer
|
||||
minimum: 1
|
||||
maxTokensPerMonth:
|
||||
type: integer
|
||||
minimum: 1
|
||||
status:
|
||||
type: string
|
||||
enum: [active, suspended]
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Organization'
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /organizations/:orgId
|
||||
|
||||
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
|
||||
|
||||
```yaml
|
||||
DELETE /organizations/{orgId}
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
409 Conflict:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ORG_HAS_ACTIVE_AGENTS"
|
||||
message: "Organization has active agents; decommission all agents before deleting"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /organizations/:orgId/members
|
||||
|
||||
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
|
||||
|
||||
```yaml
|
||||
POST /organizations/{orgId}/members
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [agentId, role]
|
||||
properties:
|
||||
agentId:
|
||||
type: string
|
||||
description: ID of an already-registered agent to add as a member
|
||||
role:
|
||||
type: string
|
||||
enum: [member, admin]
|
||||
description: Role within the organization
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OrgMember'
|
||||
example:
|
||||
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
role: "member"
|
||||
joinedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
409 Conflict:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "ALREADY_MEMBER"
|
||||
message: "Agent is already a member of this organization"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Modified: All /agents, /audit endpoints
|
||||
|
||||
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: organizations
|
||||
|
||||
```sql
|
||||
CREATE TABLE organizations (
|
||||
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
|
||||
name VARCHAR(100) NOT NULL,
|
||||
slug VARCHAR(50) NOT NULL UNIQUE,
|
||||
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
|
||||
max_agents INTEGER NOT NULL DEFAULT 100,
|
||||
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'active',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
|
||||
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_organizations_slug ON organizations(slug);
|
||||
CREATE INDEX idx_organizations_status ON organizations(status);
|
||||
```
|
||||
|
||||
### New Table: organization_members
|
||||
|
||||
```sql
|
||||
CREATE TABLE organization_members (
|
||||
member_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
|
||||
role VARCHAR(20) NOT NULL DEFAULT 'member',
|
||||
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
|
||||
UNIQUE (organization_id, agent_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
|
||||
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
|
||||
```
|
||||
|
||||
### Modified: agents table
|
||||
|
||||
```sql
|
||||
ALTER TABLE agents
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
|
||||
|
||||
-- RLS
|
||||
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY agents_org_isolation ON agents
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Modified: credentials table
|
||||
|
||||
```sql
|
||||
ALTER TABLE credentials
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
|
||||
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY credentials_org_isolation ON credentials
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Modified: audit_logs table
|
||||
|
||||
```sql
|
||||
ALTER TABLE audit_logs
|
||||
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
|
||||
|
||||
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
|
||||
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY audit_logs_org_isolation ON audit_logs
|
||||
USING (organization_id = current_setting('app.organization_id', true));
|
||||
```
|
||||
|
||||
### Seed: Default system organization
|
||||
|
||||
```sql
|
||||
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
|
||||
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
|
||||
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
|
||||
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
|
||||
- `SET LOCAL app.organization_id` is called at the start of every database transaction
|
||||
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
|
||||
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
|
||||
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
|
||||
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
|
||||
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
|
||||
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
|
||||
- [ ] Pre-existing agents assigned to default system organization without data loss
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService
|
||||
@@ -0,0 +1,366 @@
|
||||
# OpenID Connect (OIDC) — Specification
|
||||
|
||||
**Workstream**: 3 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
|
||||
|
||||
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /.well-known/openid-configuration
|
||||
|
||||
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
|
||||
|
||||
```yaml
|
||||
GET /.well-known/openid-configuration
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: OIDC Discovery document per OpenID Connect Discovery 1.0
|
||||
example:
|
||||
issuer: "https://idp.sentryagent.ai"
|
||||
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
|
||||
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
|
||||
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
|
||||
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
|
||||
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
|
||||
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
|
||||
response_types_supported:
|
||||
- "token"
|
||||
grant_types_supported:
|
||||
- "client_credentials"
|
||||
subject_types_supported:
|
||||
- "public"
|
||||
id_token_signing_alg_values_supported:
|
||||
- "RS256"
|
||||
- "ES256"
|
||||
scopes_supported:
|
||||
- "openid"
|
||||
- "agents:read"
|
||||
- "agents:write"
|
||||
- "tokens:read"
|
||||
- "audit:read"
|
||||
claims_supported:
|
||||
- "sub"
|
||||
- "iss"
|
||||
- "iat"
|
||||
- "exp"
|
||||
- "agent_id"
|
||||
- "agent_type"
|
||||
- "organization_id"
|
||||
- "capabilities"
|
||||
- "deployment_env"
|
||||
- "owner"
|
||||
token_endpoint_auth_methods_supported:
|
||||
- "client_secret_post"
|
||||
- "client_secret_basic"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /.well-known/jwks.json
|
||||
|
||||
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
|
||||
|
||||
```yaml
|
||||
GET /.well-known/jwks.json
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
Cache-Control: public, max-age=3600
|
||||
schema:
|
||||
type: object
|
||||
required: [keys]
|
||||
properties:
|
||||
keys:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
description: JSON Web Key (RFC 7517)
|
||||
properties:
|
||||
kty:
|
||||
type: string
|
||||
example: "RSA"
|
||||
use:
|
||||
type: string
|
||||
example: "sig"
|
||||
kid:
|
||||
type: string
|
||||
description: Key ID — matches `kid` header in issued JWTs
|
||||
alg:
|
||||
type: string
|
||||
example: "RS256"
|
||||
n:
|
||||
type: string
|
||||
description: RSA modulus (base64url)
|
||||
e:
|
||||
type: string
|
||||
description: RSA exponent (base64url)
|
||||
example:
|
||||
keys:
|
||||
- kty: "RSA"
|
||||
use: "sig"
|
||||
kid: "key-2026-03-29-01"
|
||||
alg: "RS256"
|
||||
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
|
||||
e: "AQAB"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### POST /oauth2/token (extended)
|
||||
|
||||
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
|
||||
|
||||
```yaml
|
||||
POST /oauth2/token
|
||||
Content-Type: application/x-www-form-urlencoded
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [grant_type, client_id, client_secret]
|
||||
properties:
|
||||
grant_type:
|
||||
type: string
|
||||
enum: [client_credentials]
|
||||
client_id:
|
||||
type: string
|
||||
client_secret:
|
||||
type: string
|
||||
scope:
|
||||
type: string
|
||||
description: Space-separated scopes. Include "openid" to receive an id_token.
|
||||
example: "openid agents:read"
|
||||
|
||||
Responses:
|
||||
200 OK (with openid scope):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
access_token:
|
||||
type: string
|
||||
token_type:
|
||||
type: string
|
||||
example: "Bearer"
|
||||
expires_in:
|
||||
type: integer
|
||||
scope:
|
||||
type: string
|
||||
id_token:
|
||||
type: string
|
||||
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
|
||||
example:
|
||||
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
token_type: "Bearer"
|
||||
expires_in: 3600
|
||||
scope: "openid agents:read"
|
||||
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
|
||||
|
||||
200 OK (without openid scope):
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
access_token:
|
||||
type: string
|
||||
token_type:
|
||||
type: string
|
||||
expires_in:
|
||||
type: integer
|
||||
scope:
|
||||
type: string
|
||||
example:
|
||||
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
token_type: "Bearer"
|
||||
expires_in: 3600
|
||||
scope: "agents:read"
|
||||
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OAuthErrorResponse'
|
||||
example:
|
||||
error: "invalid_client"
|
||||
error_description: "Invalid client credentials"
|
||||
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/OAuthErrorResponse'
|
||||
```
|
||||
|
||||
#### ID Token Claims
|
||||
|
||||
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
|
||||
|
||||
```json
|
||||
{
|
||||
"iss": "https://idp.sentryagent.ai",
|
||||
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"iat": 1743249600,
|
||||
"exp": 1743253200,
|
||||
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"agent_type": "orchestrator",
|
||||
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
|
||||
"capabilities": ["task-planning", "tool-use"],
|
||||
"deployment_env": "production",
|
||||
"owner": "acme-ai",
|
||||
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agent-info
|
||||
|
||||
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
|
||||
|
||||
```yaml
|
||||
GET /agent-info
|
||||
Authorization: Bearer <access_token>
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: Agent identity claims (subset of registered agent data)
|
||||
properties:
|
||||
sub:
|
||||
type: string
|
||||
description: Subject — agentId
|
||||
agent_id:
|
||||
type: string
|
||||
agent_type:
|
||||
type: string
|
||||
organization_id:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
deployment_env:
|
||||
type: string
|
||||
owner:
|
||||
type: string
|
||||
version:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
did:
|
||||
type: string
|
||||
description: W3C DID for this agent (if DID workstream is active)
|
||||
created_at:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agent_type: "orchestrator"
|
||||
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
capabilities: ["task-planning", "tool-use"]
|
||||
deployment_env: "production"
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
status: "active"
|
||||
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
created_at: "2026-03-29T12:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "UNAUTHORIZED"
|
||||
message: "Invalid or expired access token"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: oidc_keys
|
||||
|
||||
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
|
||||
|
||||
```sql
|
||||
CREATE TABLE oidc_keys (
|
||||
key_id VARCHAR(40) PRIMARY KEY,
|
||||
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
|
||||
algorithm VARCHAR(10) NOT NULL,
|
||||
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
|
||||
public_key_jwk JSONB NOT NULL,
|
||||
vault_key_path VARCHAR(255) NOT NULL,
|
||||
is_current BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
retired_at TIMESTAMPTZ,
|
||||
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
|
||||
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
|
||||
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
|
||||
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
|
||||
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
|
||||
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- ID token signing keys are stored in Vault; public keys only are served via JWKS
|
||||
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
|
||||
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
|
||||
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
|
||||
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
|
||||
- ID tokens for agent credentials must not contain client secrets or internal system paths
|
||||
- `alg: none` is explicitly rejected — all ID tokens must be signed
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
|
||||
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
|
||||
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
|
||||
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
|
||||
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
|
||||
- [ ] `/agent-info` returns correct claims for authenticated agent
|
||||
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService
|
||||
@@ -0,0 +1,335 @@
|
||||
# SOC 2 Type II Preparation — Specification
|
||||
|
||||
**Workstream**: 6 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
|
||||
|
||||
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
|
||||
|
||||
---
|
||||
|
||||
## Technical Controls
|
||||
|
||||
### Control C1: Encryption at Rest (Column-Level Encryption)
|
||||
|
||||
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
|
||||
|
||||
**Columns encrypted**:
|
||||
- `credentials.secret_hash` — encrypted with AES-256-CBC
|
||||
- `credentials.vault_path` — encrypted with AES-256-CBC
|
||||
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
|
||||
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
|
||||
|
||||
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
|
||||
|
||||
---
|
||||
|
||||
### Control C2: TLS Enforcement
|
||||
|
||||
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
|
||||
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
|
||||
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
|
||||
|
||||
---
|
||||
|
||||
### Control C3: Automated Secrets Rotation
|
||||
|
||||
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
|
||||
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
|
||||
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
|
||||
3. Renews Vault leases for all active credentials
|
||||
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
|
||||
|
||||
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
|
||||
|
||||
---
|
||||
|
||||
### Control C4: Audit Log Immutability (Merkle Hash Chain)
|
||||
|
||||
Every `audit_logs` row carries two new columns:
|
||||
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
|
||||
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
|
||||
|
||||
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
|
||||
|
||||
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
|
||||
|
||||
---
|
||||
|
||||
### Control C5: Security Event Alerting
|
||||
|
||||
Prometheus alerting rules are written for the following security events:
|
||||
|
||||
| Alert | Condition | Severity |
|
||||
|-------|-----------|---------|
|
||||
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
|
||||
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
|
||||
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
|
||||
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
|
||||
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
|
||||
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /audit/verify
|
||||
|
||||
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
|
||||
|
||||
```yaml
|
||||
GET /audit/verify
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Query Parameters:
|
||||
fromDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: Start of verification range. If omitted, verifies from genesis.
|
||||
toDate:
|
||||
type: string
|
||||
format: date-time
|
||||
description: End of verification range. If omitted, verifies to the latest row.
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
valid:
|
||||
type: boolean
|
||||
description: True if the chain is intact across the entire range
|
||||
rowsVerified:
|
||||
type: integer
|
||||
description: Number of audit rows verified
|
||||
firstEventId:
|
||||
type: string
|
||||
lastEventId:
|
||||
type: string
|
||||
firstTimestamp:
|
||||
type: string
|
||||
format: date-time
|
||||
lastTimestamp:
|
||||
type: string
|
||||
format: date-time
|
||||
verifiedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
brokenAtEventId:
|
||||
type: string
|
||||
nullable: true
|
||||
description: Present only if valid=false — the first eventId where the chain breaks
|
||||
example:
|
||||
valid: true
|
||||
rowsVerified: 15420
|
||||
firstEventId: "evt_genesis_00001"
|
||||
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
|
||||
firstTimestamp: "2026-01-01T00:00:00Z"
|
||||
lastTimestamp: "2026-03-29T12:00:00Z"
|
||||
verifiedAt: "2026-03-29T14:00:00Z"
|
||||
brokenAtEventId: null
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
429 Too Many Requests:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "RATE_LIMITED"
|
||||
message: "Audit verification can be run at most once per 5 minutes"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /compliance/controls
|
||||
|
||||
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
|
||||
|
||||
```yaml
|
||||
GET /compliance/controls
|
||||
Authorization: Bearer <token with admin:orgs scope>
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
generatedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
controls:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
controlId:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
status:
|
||||
type: string
|
||||
enum: [pass, fail, warning, not_applicable]
|
||||
description:
|
||||
type: string
|
||||
lastChecked:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
generatedAt: "2026-03-29T14:00:00Z"
|
||||
controls:
|
||||
- controlId: "C1"
|
||||
name: "Encryption at Rest"
|
||||
status: "pass"
|
||||
description: "Column-level encryption active for all sensitive columns"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C2"
|
||||
name: "TLS Enforcement"
|
||||
status: "pass"
|
||||
description: "All non-TLS requests redirected to HTTPS in production"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C3"
|
||||
name: "Secrets Rotation"
|
||||
status: "warning"
|
||||
description: "3 credentials expiring within 7 days"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C4"
|
||||
name: "Audit Log Immutability"
|
||||
status: "pass"
|
||||
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
- controlId: "C5"
|
||||
name: "Security Event Alerting"
|
||||
status: "pass"
|
||||
description: "All 6 alerting rules active in Prometheus"
|
||||
lastChecked: "2026-03-29T14:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### Modified: audit_logs table
|
||||
|
||||
```sql
|
||||
ALTER TABLE audit_logs
|
||||
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
|
||||
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
|
||||
|
||||
-- Back-fill genesis hash for existing rows (one-time migration)
|
||||
-- Migration script computes chain in order of created_at
|
||||
|
||||
-- Prevent updates and deletes (immutability trigger)
|
||||
CREATE OR REPLACE FUNCTION prevent_audit_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER audit_logs_immutability
|
||||
BEFORE UPDATE OR DELETE ON audit_logs
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
|
||||
```
|
||||
|
||||
### Modified: credentials table
|
||||
|
||||
```sql
|
||||
-- Columns remain same type; application now stores encrypted values
|
||||
-- No DDL change — encryption is transparent at application layer
|
||||
-- Add comment for documentation
|
||||
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
|
||||
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
|
||||
```
|
||||
|
||||
### New Table: compliance_check_log
|
||||
|
||||
```sql
|
||||
CREATE TABLE compliance_check_log (
|
||||
check_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
control_id VARCHAR(10) NOT NULL,
|
||||
status VARCHAR(20) NOT NULL,
|
||||
details JSONB NOT NULL DEFAULT '{}',
|
||||
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
|
||||
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
|
||||
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
|
||||
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
|
||||
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
|
||||
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
|
||||
|
||||
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
|
||||
|
||||
---
|
||||
|
||||
## Compliance Documentation
|
||||
|
||||
The following documents are produced as part of this workstream:
|
||||
|
||||
| Document | Path | Description |
|
||||
|----------|------|-------------|
|
||||
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
|
||||
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
|
||||
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
|
||||
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
|
||||
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
|
||||
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
|
||||
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
|
||||
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
|
||||
- `GET /compliance/controls` never returns raw secrets or key material — only control status
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
|
||||
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
|
||||
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
|
||||
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
|
||||
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
|
||||
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
|
||||
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
|
||||
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
|
||||
- [ ] Compliance documentation written and reviewed
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations
|
||||
@@ -0,0 +1,353 @@
|
||||
# W3C Decentralized Identifiers (DIDs) — Specification
|
||||
|
||||
**Workstream**: 2 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
|
||||
|
||||
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /.well-known/did.json
|
||||
|
||||
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
|
||||
|
||||
```yaml
|
||||
GET /.well-known/did.json
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: W3C DID Core 1.0 compliant DID Document
|
||||
required: [id, "@context", verificationMethod, authentication]
|
||||
properties:
|
||||
"@context":
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
example:
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
- "https://w3id.org/security/suites/jws-2020/v1"
|
||||
id:
|
||||
type: string
|
||||
description: DID for this AgentIdP instance
|
||||
example: "did:web:idp.sentryagent.ai"
|
||||
controller:
|
||||
type: string
|
||||
example: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/VerificationMethod'
|
||||
authentication:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: References to verification methods for authentication
|
||||
assertionMethod:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
service:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/DIDService'
|
||||
example:
|
||||
"@context":
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
id: "did:web:idp.sentryagent.ai"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
- id: "did:web:idp.sentryagent.ai#key-1"
|
||||
type: "JsonWebKey2020"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
publicKeyJwk:
|
||||
kty: "EC"
|
||||
crv: "P-256"
|
||||
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
|
||||
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
|
||||
authentication:
|
||||
- "did:web:idp.sentryagent.ai#key-1"
|
||||
service:
|
||||
- id: "did:web:idp.sentryagent.ai#agent-registry"
|
||||
type: "AgentIdentityProvider"
|
||||
serviceEndpoint: "https://idp.sentryagent.ai"
|
||||
500 Internal Server Error:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did
|
||||
|
||||
Per-agent DID Document. No authentication required — DID Documents are public.
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did
|
||||
No authentication required
|
||||
|
||||
Path Parameters:
|
||||
agentId:
|
||||
type: string
|
||||
description: Agent ID
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: W3C DID Core 1.0 compliant per-agent DID Document
|
||||
example:
|
||||
"@context":
|
||||
- "https://www.w3.org/ns/did/v1"
|
||||
- "https://w3id.org/agntcy/v1"
|
||||
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
controller: "did:web:idp.sentryagent.ai"
|
||||
verificationMethod:
|
||||
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
|
||||
type: "JsonWebKey2020"
|
||||
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
publicKeyJwk:
|
||||
kty: "EC"
|
||||
crv: "P-256"
|
||||
x: "abc123"
|
||||
y: "def456"
|
||||
authentication:
|
||||
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
|
||||
service:
|
||||
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
|
||||
type: "AgentCard"
|
||||
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
|
||||
agntcy:
|
||||
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
agentType: "orchestrator"
|
||||
capabilities:
|
||||
- "task-planning"
|
||||
- "tool-use"
|
||||
deploymentEnv: "production"
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "AGENT_NOT_FOUND"
|
||||
message: "Agent not found"
|
||||
410 Gone:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "AGENT_DECOMMISSIONED"
|
||||
message: "Agent has been decommissioned — DID Document is no longer active"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did/resolve
|
||||
|
||||
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did/resolve
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Path Parameters:
|
||||
agentId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
|
||||
schema:
|
||||
type: object
|
||||
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
|
||||
properties:
|
||||
didDocument:
|
||||
type: object
|
||||
description: The resolved DID Document
|
||||
didDocumentMetadata:
|
||||
type: object
|
||||
properties:
|
||||
created:
|
||||
type: string
|
||||
format: date-time
|
||||
updated:
|
||||
type: string
|
||||
format: date-time
|
||||
deactivated:
|
||||
type: boolean
|
||||
didResolutionMetadata:
|
||||
type: object
|
||||
properties:
|
||||
contentType:
|
||||
type: string
|
||||
example: "application/did+ld+json"
|
||||
retrieved:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
didDocument:
|
||||
"@context": ["https://www.w3.org/ns/did/v1"]
|
||||
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
didDocumentMetadata:
|
||||
created: "2026-03-29T12:00:00Z"
|
||||
updated: "2026-03-29T12:00:00Z"
|
||||
deactivated: false
|
||||
didResolutionMetadata:
|
||||
contentType: "application/did+ld+json"
|
||||
retrieved: "2026-03-29T14:00:00Z"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /agents/:id/did/card
|
||||
|
||||
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
|
||||
|
||||
```yaml
|
||||
GET /agents/{agentId}/did/card
|
||||
No authentication required
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
Content-Type: application/json
|
||||
schema:
|
||||
type: object
|
||||
description: AGNTCY-format agent card
|
||||
properties:
|
||||
did:
|
||||
type: string
|
||||
name:
|
||||
type: string
|
||||
agentType:
|
||||
type: string
|
||||
capabilities:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
owner:
|
||||
type: string
|
||||
version:
|
||||
type: string
|
||||
deploymentEnv:
|
||||
type: string
|
||||
identityProvider:
|
||||
type: string
|
||||
description: DID of the issuing AgentIdP instance
|
||||
issuedAt:
|
||||
type: string
|
||||
format: date-time
|
||||
example:
|
||||
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
name: "acme-orchestrator"
|
||||
agentType: "orchestrator"
|
||||
capabilities: ["task-planning", "tool-use"]
|
||||
owner: "acme-ai"
|
||||
version: "1.2.0"
|
||||
deploymentEnv: "production"
|
||||
identityProvider: "did:web:idp.sentryagent.ai"
|
||||
issuedAt: "2026-03-29T12:00:00Z"
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: agent_did_keys
|
||||
|
||||
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_did_keys (
|
||||
key_id VARCHAR(40) PRIMARY KEY,
|
||||
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
public_key_jwk JSONB NOT NULL,
|
||||
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
|
||||
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
|
||||
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
rotated_at TIMESTAMPTZ,
|
||||
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
|
||||
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
|
||||
```
|
||||
|
||||
### New Column: agents.did
|
||||
|
||||
```sql
|
||||
ALTER TABLE agents
|
||||
ADD COLUMN did VARCHAR(255),
|
||||
ADD COLUMN did_created_at TIMESTAMPTZ;
|
||||
|
||||
-- Populated automatically on agent creation
|
||||
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
|
||||
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
|
||||
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
|
||||
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
|
||||
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
|
||||
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
|
||||
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
|
||||
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
|
||||
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
|
||||
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
|
||||
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
|
||||
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
|
||||
- [ ] DID resolution endpoint returns W3C DID Resolution format
|
||||
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
|
||||
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
|
||||
- [ ] Private keys never appear in any API response or log
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService
|
||||
@@ -0,0 +1,476 @@
|
||||
# Webhooks and Event Streaming — Specification
|
||||
|
||||
**Workstream**: 5 of 6
|
||||
**Phase**: 3 — Enterprise
|
||||
**Author**: Virtual Architect
|
||||
**Date**: 2026-03-29
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
|
||||
|
||||
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
|
||||
|
||||
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /webhooks
|
||||
|
||||
Create a new webhook subscription. Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
POST /webhooks
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
required: [url, events, secret]
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
description: HTTPS endpoint to deliver events to
|
||||
example: "https://app.example.com/hooks/agentidp"
|
||||
events:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
enum:
|
||||
- agent.created
|
||||
- agent.updated
|
||||
- agent.suspended
|
||||
- agent.reactivated
|
||||
- agent.decommissioned
|
||||
- credential.generated
|
||||
- credential.rotated
|
||||
- credential.revoked
|
||||
- token.issued
|
||||
- token.revoked
|
||||
- "*"
|
||||
minItems: 1
|
||||
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
|
||||
example: ["agent.created", "credential.rotated"]
|
||||
secret:
|
||||
type: string
|
||||
minLength: 16
|
||||
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
|
||||
example: "whsec_super_secret_value_here"
|
||||
description:
|
||||
type: string
|
||||
maxLength: 255
|
||||
description: Optional human-readable description for this subscription
|
||||
active:
|
||||
type: boolean
|
||||
default: true
|
||||
|
||||
Responses:
|
||||
201 Created:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
example:
|
||||
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
|
||||
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
|
||||
url: "https://app.example.com/hooks/agentidp"
|
||||
events: ["agent.created", "credential.rotated"]
|
||||
description: "Production event sink"
|
||||
active: true
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
updatedAt: "2026-03-29T12:00:00Z"
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
examples:
|
||||
invalid_url:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "url must be a valid HTTPS URI"
|
||||
invalid_event:
|
||||
code: "VALIDATION_ERROR"
|
||||
message: "Unknown event type: agent.unknown"
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks
|
||||
|
||||
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Query Parameters:
|
||||
active:
|
||||
type: boolean
|
||||
description: Filter by active/inactive subscriptions
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 20
|
||||
maximum: 100
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks/:id
|
||||
|
||||
Get a single webhook subscription. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Path Parameters:
|
||||
subscriptionId:
|
||||
type: string
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
example:
|
||||
code: "WEBHOOK_NOT_FOUND"
|
||||
message: "Webhook subscription not found"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PATCH /webhooks/:id
|
||||
|
||||
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
PATCH /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
Content-Type: application/json
|
||||
|
||||
Request Body:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
events:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description:
|
||||
type: string
|
||||
maxLength: 255
|
||||
active:
|
||||
type: boolean
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
$ref: '#/components/schemas/WebhookSubscription'
|
||||
400 Bad Request:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### DELETE /webhooks/:id
|
||||
|
||||
Delete a webhook subscription. Requires `agents:write` scope.
|
||||
|
||||
```yaml
|
||||
DELETE /webhooks/{subscriptionId}
|
||||
Authorization: Bearer <token with agents:write scope>
|
||||
|
||||
Responses:
|
||||
204 No Content: {}
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
403 Forbidden:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GET /webhooks/:id/deliveries
|
||||
|
||||
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
|
||||
|
||||
```yaml
|
||||
GET /webhooks/{subscriptionId}/deliveries
|
||||
Authorization: Bearer <token with agents:read scope>
|
||||
|
||||
Query Parameters:
|
||||
status:
|
||||
type: string
|
||||
enum: [pending, success, failed, dead_letter]
|
||||
eventType:
|
||||
type: string
|
||||
description: Filter by event type
|
||||
fromDate:
|
||||
type: string
|
||||
format: date-time
|
||||
toDate:
|
||||
type: string
|
||||
format: date-time
|
||||
page:
|
||||
type: integer
|
||||
default: 1
|
||||
limit:
|
||||
type: integer
|
||||
default: 50
|
||||
maximum: 200
|
||||
|
||||
Responses:
|
||||
200 OK:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: '#/components/schemas/WebhookDelivery'
|
||||
total:
|
||||
type: integer
|
||||
page:
|
||||
type: integer
|
||||
limit:
|
||||
type: integer
|
||||
example:
|
||||
data:
|
||||
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
|
||||
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
|
||||
eventType: "agent.created"
|
||||
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
|
||||
status: "success"
|
||||
httpStatusCode: 200
|
||||
attemptCount: 1
|
||||
nextRetryAt: null
|
||||
deliveredAt: "2026-03-29T12:00:05Z"
|
||||
createdAt: "2026-03-29T12:00:00Z"
|
||||
total: 1
|
||||
page: 1
|
||||
limit: 50
|
||||
401 Unauthorized:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
404 Not Found:
|
||||
schema:
|
||||
$ref: '#/components/schemas/ErrorResponse'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Webhook Payload Format
|
||||
|
||||
Every webhook delivery uses this envelope format:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
|
||||
"type": "agent.created",
|
||||
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
|
||||
"timestamp": "2026-03-29T12:00:00Z",
|
||||
"data": {
|
||||
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
|
||||
"agentType": "orchestrator",
|
||||
"status": "active",
|
||||
"owner": "acme-ai",
|
||||
"version": "1.0.0",
|
||||
"deploymentEnv": "production"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### HMAC-SHA256 Signature
|
||||
|
||||
Every delivery includes the following HTTP headers:
|
||||
|
||||
```
|
||||
X-AgentIdP-Event: agent.created
|
||||
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
|
||||
X-AgentIdP-Timestamp: 1743249600
|
||||
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
|
||||
```
|
||||
|
||||
Signature computation:
|
||||
```
|
||||
signed_content = timestamp + "." + JSON.stringify(payload)
|
||||
signature = HMAC-SHA256(secret, signed_content)
|
||||
header_value = "sha256=" + hex(signature)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### New Table: webhook_subscriptions
|
||||
|
||||
```sql
|
||||
CREATE TABLE webhook_subscriptions (
|
||||
subscription_id VARCHAR(40) PRIMARY KEY,
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
url VARCHAR(2048) NOT NULL,
|
||||
events JSONB NOT NULL DEFAULT '[]',
|
||||
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
|
||||
vault_secret_path VARCHAR(255) NOT NULL,
|
||||
description VARCHAR(255),
|
||||
active BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
failure_count INTEGER NOT NULL DEFAULT 0,
|
||||
last_delivery_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
|
||||
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
|
||||
```
|
||||
|
||||
### New Table: webhook_deliveries
|
||||
|
||||
```sql
|
||||
CREATE TABLE webhook_deliveries (
|
||||
delivery_id VARCHAR(40) PRIMARY KEY,
|
||||
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
|
||||
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
|
||||
event_id VARCHAR(40) NOT NULL,
|
||||
event_type VARCHAR(100) NOT NULL,
|
||||
payload JSONB NOT NULL,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'pending',
|
||||
http_status_code SMALLINT,
|
||||
response_body TEXT,
|
||||
attempt_count SMALLINT NOT NULL DEFAULT 0,
|
||||
next_retry_at TIMESTAMPTZ,
|
||||
delivered_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
|
||||
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
|
||||
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
|
||||
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Retry Schedule
|
||||
|
||||
```
|
||||
Attempt 1: immediate
|
||||
Attempt 2: 1 minute after failure
|
||||
Attempt 3: 5 minutes after failure
|
||||
Attempt 4: 15 minutes after failure
|
||||
Attempt 5: 1 hour after failure
|
||||
Attempt 6: 4 hours after failure
|
||||
Attempt 7: 12 hours after failure
|
||||
Attempt 8: 24 hours after failure
|
||||
Attempt 9: 48 hours after failure
|
||||
Attempt 10: 72 hours after failure
|
||||
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
| Environment Variable | Description | Default |
|
||||
|---------------------|-------------|---------|
|
||||
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
|
||||
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
|
||||
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
|
||||
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
|
||||
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
|
||||
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
|
||||
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
|
||||
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
|
||||
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
|
||||
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
|
||||
- HMAC signature allows the receiving server to verify the delivery is authentic
|
||||
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
|
||||
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
|
||||
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
|
||||
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
|
||||
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
|
||||
- [ ] After max retries, status is `dead_letter` and metric is incremented
|
||||
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
|
||||
- [ ] Delivery to private IP range is rejected (SSRF protection)
|
||||
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
|
||||
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService
|
||||
142
openspec/changes/archive/2026-04-02-phase-3-enterprise/tasks.md
Normal file
142
openspec/changes/archive/2026-04-02-phase-3-enterprise/tasks.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Phase 3: Enterprise — Tasks
|
||||
|
||||
**Status**: COMPLETE — All 6 workstreams done ✅
|
||||
|
||||
## CEO Approval Gates (required before implementation)
|
||||
|
||||
- [x] A0.1 Approve dependency: `did-resolver` + `web-did-resolver` (W3C DID support)
|
||||
- [x] A0.2 Approve dependency: `oidc-provider` (certified OIDC server library)
|
||||
- [x] A0.3 Approve dependency: `bull` (Redis-backed webhook delivery queue)
|
||||
- [x] A0.4 Approve dependency: `kafkajs` (optional Kafka adapter for webhooks)
|
||||
- [x] A0.5 Approve dependency: `node-forge` (column-level encryption for SOC 2)
|
||||
|
||||
---
|
||||
|
||||
## Workstream 1: Multi-Tenancy
|
||||
|
||||
- [x] 1.1 Write `src/db/migrations/006_create_organizations_table.sql` — organizations table with slug, plan_tier, max_agents, max_tokens_per_month, status
|
||||
- [x] 1.2 Write `src/db/migrations/007_create_organization_members_table.sql` — organization_members with agent_id FK and role
|
||||
- [x] 1.3 Write `src/db/migrations/008_add_organization_id_to_agents.sql` — add organization_id column + index + RLS policy on agents
|
||||
- [x] 1.4 Write `src/db/migrations/009_add_organization_id_to_credentials.sql` — add organization_id column + index + RLS policy on credentials
|
||||
- [x] 1.5 Write `src/db/migrations/010_add_organization_id_to_audit_logs.sql` — add organization_id column + index + RLS policy on audit_logs
|
||||
- [x] 1.6 Write `src/db/migrations/011_seed_system_organization.sql` — insert default system org and backfill existing rows
|
||||
- [x] 1.7 Write `src/types/organization.ts` — IOrganization, ICreateOrgRequest, IUpdateOrgRequest, IOrgMember, IPaginatedOrgsResponse, OrgStatus, PlanTier interfaces
|
||||
- [x] 1.8 Write `src/services/OrgService.ts` — createOrg, listOrgs, getOrg, updateOrg, deleteOrg, addMember; all methods accept organizationId context
|
||||
- [x] 1.9 Write `src/controllers/OrgController.ts` — request parsing and validation for all 6 org endpoints
|
||||
- [x] 1.10 Write `src/routes/organizations.ts` — mount all 6 org endpoints with admin:orgs scope guard
|
||||
- [x] 1.11 Write `src/middleware/orgContext.ts` — OrgContextMiddleware: extracts organization_id from JWT and calls SET app.organization_id before each DB query
|
||||
- [x] 1.12 Update `src/middleware/auth.ts` — extend ITokenPayload with organization_id claim; backfill from DEFAULT_ORG_ID for backward compat
|
||||
- [x] 1.13 Update `src/services/AgentService.ts` — organizationId propagated via RLS session variable (orgContext middleware)
|
||||
- [x] 1.14 Update `src/services/CredentialService.ts` — organizationId propagated via RLS session variable
|
||||
- [x] 1.15 Update `src/services/AuditService.ts` — organizationId propagated via RLS session variable
|
||||
- [x] 1.16 Update `src/services/OAuth2Service.ts` — include organization_id claim in issued JWT payload
|
||||
- [x] 1.17 Update `src/types/index.ts` — extend ITokenPayload with organization_id field, admin:orgs scope, org audit actions
|
||||
- [x] 1.18 Update OPA policy `policies/authz.rego` + `policies/data/scopes.json` — 6 new org endpoint → admin:orgs mappings
|
||||
- [x] 1.19 Write unit tests for OrgService (CRUD, member management, org isolation)
|
||||
- [x] 1.20 Write integration tests — all 6 /organizations endpoints, cross-org isolation via RLS
|
||||
- [x] 1.21 QA sign-off: 373 tests passing, 80.64% branch coverage, zero `any`, TypeScript clean
|
||||
|
||||
---
|
||||
|
||||
## Workstream 2: W3C DIDs
|
||||
|
||||
- [x] 2.1 Write `src/db/migrations/012_create_agent_did_keys_table.sql` — agent_did_keys table with public_key_jwk JSONB and vault_key_path
|
||||
- [x] 2.2 Write `src/db/migrations/013_add_did_columns_to_agents.sql` — add did and did_created_at columns to agents
|
||||
- [x] 2.3 Write `src/types/did.ts` — IDIDDocument, IVerificationMethod, IDIDService, IDIDResolutionResult, IAgentCard interfaces
|
||||
- [x] 2.4 Write `src/services/DIDService.ts` — generateDIDForAgent (EC P-256 key pair, Vault storage, public key in DB), buildInstanceDIDDocument, buildAgentDIDDocument, buildAgentCard, buildResolutionResult
|
||||
- [x] 2.5 Update `src/services/AgentService.ts` — call DIDService.generateDIDForAgent on every new agent registration
|
||||
- [x] 2.6 Write `src/controllers/DIDController.ts` — handlers for root DID Document, per-agent DID Document (410 for decommissioned), resolution endpoint, agent card
|
||||
- [x] 2.7 Write `src/routes/did.ts` — createDIDRouter for `/agents/:id/did`, `/did/resolve`, `/did/card`; `/.well-known/did.json` registered in app.ts
|
||||
- [x] 2.8 Implement Redis caching in DIDService — cache DID Documents with TTL from DID_DOCUMENT_CACHE_TTL_SECONDS (default 300s)
|
||||
- [x] 2.9 Handle decommissioned agents — deactivated: true in metadata; HTTP 410 Gone from DIDController
|
||||
- [x] 2.10 Write unit tests for DIDService — 39 tests, 98.93% coverage; private key security asserted
|
||||
- [x] 2.11 Write integration tests — all 4 DID endpoints; 22 tests
|
||||
- [x] 2.12 QA sign-off: 429 tests passing, 98.93% DIDService coverage, private key never in response, zero `any`
|
||||
|
||||
---
|
||||
|
||||
## Workstream 3: OpenID Connect (OIDC)
|
||||
|
||||
- [x] 3.1 Write `src/db/migrations/014_create_oidc_keys_table.sql` — oidc_keys table with kid, public_key_jwk, vault_key_path, is_current
|
||||
- [x] 3.2 Write `src/services/OIDCKeyService.ts` — generateSigningKeyPair (RSA-2048 or EC P-256), storeKeyInVault, getPublicJWKS, getCurrentKeyId, rotateKey
|
||||
- [x] 3.3 Write `src/services/IDTokenService.ts` — buildIDTokenClaims (agent claims), signIDToken using current Vault-stored key, verifyIDToken
|
||||
- [x] 3.4 Write `src/types/oidc.ts` — IIDTokenClaims, IJWKSResponse, IOIDCDiscoveryDocument, IAgentInfoResponse interfaces
|
||||
- [x] 3.5 Write `src/controllers/OIDCController.ts` — handlers for discovery, JWKS, agent-info
|
||||
- [x] 3.6 Write `src/routes/oidc.ts` — mount `/.well-known/openid-configuration`, `/.well-known/jwks.json`, `/agent-info`
|
||||
- [x] 3.7 Update `src/services/OAuth2Service.ts` — when `openid` scope is present in request, generate and append `id_token` to token response
|
||||
- [x] 3.8 Implement JWKS caching — cache JWKS in Redis with TTL; invalidate on key rotation
|
||||
- [x] 3.9 Implement key rotation logic — on rotation, old key remains in JWKS until all tokens signed with it have expired
|
||||
- [x] 3.10 Write unit tests for OIDCKeyService and IDTokenService — key generation, token signing, JWKS format
|
||||
- [x] 3.11 Write integration tests — POST /oauth2/token with `openid` scope returns id_token; validate id_token against JWKS; GET /agent-info returns correct claims
|
||||
- [x] 3.12 QA sign-off: OIDC discovery document passes conformance checks, id_token verifiable, `alg: none` rejected, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 4: AGNTCY Federation
|
||||
|
||||
- [x] 4.1 Write `src/db/migrations/015_create_federation_partners_table.sql` — federation_partners table with issuer, jwks_uri, allowed_organizations JSONB, status, expires_at
|
||||
- [x] 4.2 Write `src/types/federation.ts` — IFederationPartner, ICreatePartnerRequest, IVerifyFederatedTokenRequest, IFederationVerifyResult interfaces
|
||||
- [x] 4.3 Write `src/services/FederationService.ts` — registerPartner (validates by fetching JWKS), listPartners, deletePartner, verifyFederatedToken (fetch-or-cache JWKS, verify signature, validate claims)
|
||||
- [x] 4.4 Implement JWKS caching in FederationService — store partner JWKS in Redis with TTL configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS
|
||||
- [x] 4.5 Write `src/controllers/FederationController.ts` — handlers for POST /federation/trust, GET /federation/partners, DELETE /federation/partners/:id, POST /federation/verify
|
||||
- [x] 4.6 Write `src/routes/federation.ts` — mount all 4 federation endpoints
|
||||
- [x] 4.7 Implement partner expiry check — partners past `expires_at` are treated as status `expired`; their tokens rejected
|
||||
- [x] 4.8 Implement `allowedOrganizations` filter — reject tokens whose `organization_id` is not in the allow list (if list is non-empty)
|
||||
- [x] 4.9 Write unit tests for FederationService — trust registration, token verification (valid/expired/untrusted/tampered), JWKS cache behavior
|
||||
- [x] 4.10 Write integration tests — end-to-end: register partner, verify a valid token from that partner, verify rejection for unknown issuer
|
||||
- [x] 4.11 QA sign-off: tampered token rejected, expired partner rejected, JWKS cache verified, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 5: Webhooks and Event Streaming
|
||||
|
||||
- [x] 5.1 Write `src/db/migrations/016_create_webhook_subscriptions_table.sql` — webhook_subscriptions with url, events JSONB, secret_hash, vault_secret_path, active, failure_count
|
||||
- [x] 5.2 Write `src/db/migrations/017_create_webhook_deliveries_table.sql` — webhook_deliveries with status, http_status_code, attempt_count, next_retry_at
|
||||
- [x] 5.3 Write `src/types/webhook.ts` — IWebhookSubscription, ICreateWebhookRequest, IWebhookDelivery, IWebhookPayload, WebhookEventType interfaces
|
||||
- [x] 5.4 Write `src/services/WebhookService.ts` — createSubscription (store secret in Vault), listSubscriptions, getSubscription, updateSubscription, deleteSubscription, listDeliveries
|
||||
- [x] 5.5 Write `src/workers/WebhookDeliveryWorker.ts` — bull queue worker: fetch subscription, compute HMAC-SHA256 signature, POST to URL with headers, update delivery status, schedule retry on failure
|
||||
- [x] 5.6 Write `src/services/EventPublisher.ts` — buildEventPayload, publishEvent (enqueues to bull queue; also produces to Kafka if KAFKA_BROKERS is set)
|
||||
- [x] 5.7 Update `src/services/AgentService.ts` — call EventPublisher.publishEvent for: agent.created, agent.updated, agent.suspended, agent.reactivated, agent.decommissioned
|
||||
- [x] 5.8 Update `src/services/CredentialService.ts` — call EventPublisher.publishEvent for: credential.generated, credential.rotated, credential.revoked
|
||||
- [x] 5.9 Update `src/services/OAuth2Service.ts` — call EventPublisher.publishEvent for: token.issued, token.revoked
|
||||
- [x] 5.10 Write `src/controllers/WebhookController.ts` — handlers for all 6 webhook endpoints
|
||||
- [x] 5.11 Write `src/routes/webhooks.ts` — mount all 6 webhook endpoints with correct scope guards
|
||||
- [x] 5.12 Implement SSRF protection in WebhookDeliveryWorker — reject delivery to RFC 1918 addresses, loopback, and link-local ranges
|
||||
- [x] 5.13 Implement dead-letter handling — after max retries, set status to dead_letter and increment `agentidp_webhook_dead_letters_total` Prometheus metric
|
||||
- [x] 5.14 Write `src/adapters/KafkaAdapter.ts` — optional Kafka producer; activated only when KAFKA_BROKERS env var is set
|
||||
- [x] 5.15 Write unit tests for WebhookService, WebhookDeliveryWorker, EventPublisher — HMAC computation, retry schedule, dead-letter logic
|
||||
- [x] 5.16 Write integration tests — create subscription, trigger an event, verify delivery; verify SSRF rejection; verify retry on 5xx response
|
||||
- [x] 5.17 QA sign-off: HMAC verifiable, SSRF protection active, retry schedule correct, dead-letter metric fires, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Workstream 6: SOC 2 Type II Preparation
|
||||
|
||||
- [x] 6.1 Enable `pgcrypto` PostgreSQL extension in `src/db/migrations/018_enable_pgcrypto.sql`
|
||||
- [x] 6.2 Write `src/services/EncryptionService.ts` — AES-256-CBC encrypt/decrypt using key from Vault; methods: encryptColumn, decryptColumn, isEncrypted
|
||||
- [x] 6.3 Write `src/db/migrations/019_encrypt_sensitive_columns.sql` — re-encrypt existing credentials.secret_hash and credentials.vault_path values using EncryptionService (migration script)
|
||||
- [x] 6.4 Update `src/services/CredentialService.ts` — all reads/writes of secret_hash and vault_path go through EncryptionService
|
||||
- [x] 6.5 Update `src/services/WebhookService.ts` — vault_secret_path column encrypted via EncryptionService
|
||||
- [x] 6.6 Update `src/services/DIDService.ts` — vault_key_path in agent_did_keys encrypted via EncryptionService
|
||||
- [x] 6.7 Write `src/middleware/TLSEnforcementMiddleware.ts` — redirect HTTP to HTTPS in production using X-Forwarded-Proto header; passthrough in development
|
||||
- [x] 6.8 Register TLSEnforcementMiddleware in `src/app.ts` — first in middleware stack
|
||||
- [x] 6.9 Write `src/db/migrations/020_add_audit_chain_columns.sql` — add hash and previous_hash columns to audit_logs; add immutability trigger; backfill chain for existing rows
|
||||
- [x] 6.10 Update `src/services/AuditService.ts` — compute Merkle hash on every insert: hash = SHA-256(eventId + timestamp + action + outcome + agentId + organizationId + previousHash)
|
||||
- [x] 6.11 Write `src/services/AuditVerificationService.ts` — verifyChain(fromDate?, toDate?): reads rows in order, recomputes hashes, returns IChainVerificationResult
|
||||
- [x] 6.12 Write `src/jobs/SecretsRotationJob.ts` — cron job: identify expiring credentials, emit `agentidp_credentials_expiring_soon_total` metric, renew Vault leases
|
||||
- [x] 6.13 Write `src/jobs/AuditChainVerificationJob.ts` — cron job: runs verifyChain on a schedule, sets `agentidp_audit_chain_integrity` Prometheus gauge to 1 (pass) or 0 (fail)
|
||||
- [x] 6.14 Write `src/controllers/ComplianceController.ts` — handlers for GET /audit/verify and GET /compliance/controls
|
||||
- [x] 6.15 Write `src/routes/compliance.ts` — mount /audit/verify (rate-limited) and /compliance/controls
|
||||
- [x] 6.16 Write `monitoring/prometheus/alerts.yml` — all 6 alerting rules: AuthFailureSpike, RateLimitExhaustion, AnomalousTokenIssuance, WebhookDeadLetterAccumulating, AuditChainIntegrityFailed, CredentialExpiryApproaching
|
||||
- [x] 6.17 Update `monitoring/prometheus/prometheus.yml` — add alerting rules file reference
|
||||
- [x] 6.18 Write compliance documentation package: `docs/compliance/soc2-controls-matrix.md` (Trust Services Criteria → controls map), `docs/compliance/encryption-runbook.md` (key rotation procedure), `docs/compliance/audit-log-runbook.md` (chain verification guide)
|
||||
- [x] 6.19 Write operational runbooks: `docs/compliance/incident-response.md` (security event procedures), `docs/compliance/secrets-rotation.md` (credential and signing key rotation guide)
|
||||
- [x] 6.20 Write unit tests for EncryptionService (encrypt/decrypt round-trip, Vault key fetch) and AuditVerificationService (intact chain, tampered chain with correct brokenAtEventId)
|
||||
- [x] 6.21 Write integration tests — TLS enforcement verified, encrypted columns not plaintext-readable in direct DB query, chain verification returns correct results
|
||||
- [x] 6.22 QA sign-off: all 5 controls pass GET /compliance/controls, all 6 Prometheus alerts valid, zero `any`, >80% coverage
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 Complete Criteria
|
||||
|
||||
All 6 workstreams done. All tasks checked. All QA gates passed. CEO reviewed. SOC 2 audit window begins.
|
||||
Reference in New Issue
Block a user