Compare commits

..

17 Commits

Author SHA1 Message Date
SentryAgent.ai Developer
831e91c467 chore(openspec): archive phase-4-developer-growth change
All 90 tasks complete. Phase 4 — Developer Growth & Go-to-Market
fully delivered and archived per OpenSpec protocol.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 15:17:18 +00:00
SentryAgent.ai Developer
af630b43d4 chore(phase-4): QA fixes + gitignore portal build artifacts
- Fix 7 test fixtures missing isPublic field added in WS4 Marketplace
- Add portal/.next/ to .gitignore (build artifacts should not be tracked)
- Mark all Phase 4 tasks 11.1-11.11 complete in tasks.md

QA results: 611/611 tests pass, tsc zero errors, portal build OK, CLI build OK

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:59:11 +00:00
SentryAgent.ai Developer
26a56f84e1 feat(phase-4): WS6 — Billing & Usage Metering (Stripe, free tier enforcement)
- DB migration 023: tenant_subscriptions and usage_events tables
- UsageMeteringMiddleware: in-memory counters, 60s flush to DB via UPSERT
- FreeTierEnforcementMiddleware: 10 agents / 1,000 calls/day limits, Redis cache
- UsageService: getDailyUsage and getActiveAgentCount
- BillingService: Stripe checkout sessions, webhook verification, subscription status
- POST /billing/checkout, POST /billing/webhook, GET /billing/usage endpoints
- BILLING_ENABLED=false disables enforcement without breaking metering
- Dashboard: Usage tab with Free Tier/Pro badges and metric cards
- 19 unit tests passing across billing services and middleware

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:51:36 +00:00
SentryAgent.ai Developer
fefbf1e3ea feat(phase-4): WS5 — GitHub Actions OIDC token exchange and trust policies
- POST /oidc/token: GitHub OIDC JWT exchange (bootstrap + agent-scoped modes)
- POST/GET/DELETE /oidc/trust-policies: trust policy CRUD with enforcement
- DB migration 022: oidc_trust_policies table with provider/repo/branch/agent_id
- GitHub Actions: register-agent and issue-token actions with full READMEs
- Trust policy enforcement rejects token exchanges not matching registered policies
- Bootstrap mode issues agents:write token for new agent registration without agentId

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:37:39 +00:00
SentryAgent.ai Developer
89c99b666d feat(phase-4): WS4 — Agent Marketplace (public registry, pagination, filters)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:17:51 +00:00
SentryAgent.ai Developer
d1e6af25aa feat(phase-4): WS2 + WS3 — Developer Portal (Next.js 14) and CLI tool (sentryagent)
WS2: Developer Portal (portal/)
- Standalone Next.js 14 + Tailwind CSS app — independent deployment
- Home page: hero, feature grid, CTA to /get-started
- /pricing: free tier limits table (10 agents, 1k calls/day) + paid tier CTA
- /sdks: all 4 SDKs (Node.js, Python, Go, Java) with install + code examples
- /api-explorer: Swagger UI from NEXT_PUBLIC_API_URL/openapi.json, persistAuthorization
- /get-started: 4-step wizard (setup → register agent → credentials → SDK snippet)
- Shared Nav component with active-link highlighting
- Build: 8/8 static pages, zero TypeScript errors

WS3: CLI Tool (cli/ — npm package: sentryagent)
- configure, register-agent, list-agents, issue-token, rotate-credentials, tail-audit-log
- Auto OAuth2 token fetch + 30s-buffer cache via client_credentials flow
- chalk-formatted table output, confirmation prompts, bounded audit log dedup
- bash + zsh shell completion scripts
- README with installation, all commands, and completion setup
- Build: tsc clean, node dist/index.js --help verified

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 04:29:50 +00:00
SentryAgent.ai Developer
1b682c22b2 feat(phase-4): WS1 — Production Hardening (Redis rate limiting, DB pool, health endpoint, k6)
Rate limiting:
- Replace in-memory express-rate-limit with ioredis + rate-limiter-flexible (sliding window)
- Graceful fallback to RateLimiterMemory when Redis unreachable
- RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS env var config
- Retry-After header on 429 responses
- agentidp_rate_limit_hits_total Prometheus counter

Database pool:
- Explicit pg.Pool config via DB_POOL_MAX/MIN/IDLE_TIMEOUT_MS/CONNECTION_TIMEOUT_MS
- Defaults: max=20, min=2, idle=30s, conn timeout=5s
- agentidp_db_pool_active_connections + agentidp_db_pool_waiting_requests gauges

Health endpoint:
- GET /health/detailed — per-service status (database, Redis, Vault, OPA)
- healthy / degraded (>1000ms) / unreachable classification
- HTTP 200 (all healthy) / 207 (any degraded) / 503 (any unreachable)

Load tests:
- tests/load/ with k6 scenarios for agent registration (100 VUs), token issuance (1000 VUs), credential rotation (50 VUs)
- npm run load-test script

Tests: 586 passing, zero TypeScript errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 04:20:37 +00:00
SentryAgent.ai Developer
b0f70b7ac4 feat(openspec): Phase 4 Developer Growth & Go-to-Market Readiness
OpenSpec change: phase-4-developer-growth (spec-driven, 4/4 artifacts)

6 workstreams, 90 implementation tasks, delivery sequence:
WS1 → WS2 + WS3 (parallel) → WS4 → WS5 → WS6

Workstreams:
1. Production Hardening — ioredis rate limiting, DB pool tuning, /health/detailed, k6 load tests
2. Developer Portal — Next.js 14, Swagger UI explorer, onboarding wizard, pricing/SDK pages
3. CLI Tool — sentryagent npm CLI, 5 commands, shell completion
4. Agent Marketplace — public searchable registry powered by existing agent/DID infrastructure
5. GitHub Actions — register-agent + issue-token Actions via OIDC (no stored secrets)
6. Billing & Usage Metering — Stripe Checkout, webhook-driven state, free tier enforcement

New capabilities (8 specs): production-hardening, developer-portal, cli-tool,
agent-marketplace, github-actions, billing-metering (+delta: web-dashboard, monitoring)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 04:00:34 +00:00
SentryAgent.ai Developer
f1fbe0e29a chore(openspec): archive all completed changes, sync 14 new specs to library
Archived 4 completed OpenSpec changes (2026-04-02):
- phase-3-enterprise (100/100 tasks) — 6 Phase 3 capabilities synced
- devops-documentation (48/48 tasks) — 3 new + 1 merged capability
- bedroom-developer-docs (33/33 tasks) — 4 new capabilities synced
- engineering-docs (superseded by 2026-03-29 archive) — no tasks

Main spec library grows from 21 → 35 capabilities (+14 new):
federation, multi-tenancy, oidc, soc2, w3c-dids, webhooks,
database, operations, system-overview, api-reference, core-concepts,
developer-guides, quick-start + deployment (merged additive requirements)

Active changes: 0 — project board is clear for Phase 4 planning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 03:50:47 +00:00
SentryAgent.ai Developer
ceec22f714 chore(phase-3): mark WS6 tasks complete — Phase 3 Enterprise DONE
All 100/100 tasks checked. All 6 workstreams complete. QA-approved.
SOC 2 audit window can begin.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:42:29 +00:00
SentryAgent.ai Developer
fd90b2acd1 feat(phase-3): workstream 6 — SOC 2 Type II Preparation
Implements all 22 WS6 tasks completing Phase 3 Enterprise.

Column-level encryption (AES-256-CBC, Vault-backed key) via EncryptionService
applied to credentials.secret_hash, credentials.vault_path,
webhook_subscriptions.vault_secret_path, and agent_did_keys.vault_key_path.
Backward-compatible: isEncrypted() guard skips decryption for existing
plaintext rows until next read-write cycle.

Audit chain integrity (CC7.2): AuditRepository computes SHA-256 Merkle hash
on every INSERT (hash = SHA-256(eventId+timestamp+action+outcome+agentId+orgId+prevHash)).
AuditVerificationService walks the full chain verifying hash continuity.
AuditChainVerificationJob runs hourly; sets agentidp_audit_chain_integrity
Prometheus gauge to 1 (pass) or 0 (fail).

TLS enforcement (CC6.7): TLSEnforcementMiddleware registered as first
middleware in Express stack; 301 redirect on non-https X-Forwarded-Proto
in production.

SecretsRotationJob (CC9.2): hourly scan for credentials expiring within 7
days; increments agentidp_credentials_expiring_soon_total.

ComplianceController + routes: GET /audit/verify (auth+audit:read scope,
30/min rate-limit); GET /compliance/controls (public, Cache-Control 60s).
ComplianceStatusStore: module-level map updated by jobs, consumed by controller.

Prometheus: 2 new metrics (agentidp_credentials_expiring_soon_total,
agentidp_audit_chain_integrity); 6 alerting rules in alerts.yml.

Compliance docs: soc2-controls-matrix.md, encryption-runbook.md,
audit-log-runbook.md, incident-response.md, secrets-rotation.md.

Tests: 557 unit tests passing (35 suites); 26 new tests (EncryptionService,
AuditVerificationService); 19 compliance integration tests. TypeScript clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:41:53 +00:00
SentryAgent.ai Developer
272b69f18d feat(phase-3): workstream 5 — Webhooks & Event Streaming
- DB migrations 016/017: webhook_subscriptions and webhook_deliveries tables
- WebhookService: CRUD for subscriptions, Vault-backed secret storage, delivery history
- WebhookDeliveryWorker: Bull queue, HMAC-SHA256 signatures, exponential backoff,
  SSRF protection (RFC 1918 + loopback + link-local rejection), dead-letter handling
- EventPublisher: publishes 10 event types (agent/credential/token lifecycle);
  optional Kafka adapter activated via KAFKA_BROKERS env var
- AgentService, CredentialService, OAuth2Service: wired to EventPublisher
- WebhookController + routes: 6 endpoints with webhooks:read / webhooks:write scope guards
- KafkaAdapter: optional Kafka producer (kafkajs), no-op when KAFKA_BROKERS unset
- OAuthScope extended: webhooks:read, webhooks:write
- AuditAction extended: webhook.created, webhook.updated, webhook.deleted
- Metrics: agentidp_webhook_dead_letters_total counter added to registry
- 523 unit tests passing; TypeScript strict throughout, zero `any`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:07:41 +00:00
SentryAgent.ai Developer
03b5de300c feat(phase-3): workstream 4 — AGNTCY Federation
Implements cross-IdP token verification for the AGNTCY ecosystem:

- Migration 015: federation_partners table (issuer, jwks_uri,
  allowed_organizations JSONB, status, expires_at)
- FederationService: registerPartner (JWKS validation at registration),
  listPartners, getPartner, updatePartner, deletePartner,
  verifyFederatedToken (alg:none rejected, RS256/ES256 only,
  allowedOrganizations filter, expiry enforcement)
- JWKS caching in Redis (TTL: FEDERATION_JWKS_CACHE_TTL_SECONDS);
  cache invalidated on partner delete and jwks_uri change
- FederationController + routes: 5 admin:orgs endpoints +
  POST /federation/verify (agents:read)
- OPA policy: 5 federation admin endpoint → admin:orgs mappings
- 499 unit tests passing; 94.69% statement coverage on FederationService

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 10:13:49 +00:00
SentryAgent.ai Developer
5e465e596a feat(phase-3): workstream 3 — OpenID Connect (OIDC) Provider
Implements full OIDC layer on top of the existing OAuth 2.0 token service:

- Migration 014: oidc_keys table (RSA/EC key pairs, is_current flag, expires_at
  for rotation grace period)
- OIDCKeyService: key generation (RS256/ES256), Vault storage, JWKS with Redis
  cache, key rotation with grace period, pruneExpiredKeys
- IDTokenService: buildIDTokenClaims (agent claims, nonce, DID), signIDToken
  (kid in JWT header), verifyIDToken (alg:none rejected, RS256/ES256 only)
- OIDCController: discovery document, JWKS (Cache-Control), /agent-info
- OIDC routes mounted at / — /.well-known/openid-configuration,
  /.well-known/jwks.json, /agent-info
- OAuth2Service: id_token appended to token response when openid scope requested
- 473 unit tests passing (100% OIDCKeyService stmts, 95.91% IDTokenService stmts)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 09:54:26 +00:00
SentryAgent.ai Developer
3d1fff15f6 feat(phase-3): workstream 2 — W3C DIDs
Implements W3C DID Core 1.0 per-agent identity for every registered agent:

Schema:
- agent_did_keys table: stores EC P-256 public key JWK + Vault path for private key
- agents.did + agents.did_created_at columns

Key management:
- EC P-256 key pair generated on every agent registration via Node.js crypto
- Private key stored in Vault KV v2 (dev:no-vault marker when Vault not configured)
- Public key JWK stored in PostgreSQL agent_did_keys table

API (4 new endpoints):
- GET /.well-known/did.json — instance DID Document (public, cached)
- GET /api/v1/agents/:id/did — per-agent DID Document (public, 410 for decommissioned)
- GET /api/v1/agents/:id/did/resolve — W3C DID Resolution result (agents:read scope)
- GET /api/v1/agents/:id/did/card — AGNTCY agent card (public)

Implementation:
- DIDService: DID construction, key generation, Redis caching (TTL configurable)
- DIDController: 410 Gone for decommissioned agents, correct Content-Type on resolve
- AgentService: calls DIDService.generateDIDForAgent on every new registration

Tests: 429 passing, DIDService 98.93% coverage, private key absence verified in all responses

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 00:47:59 +00:00
SentryAgent.ai Developer
d252097f71 feat(phase-3): workstream 1 — Multi-Tenancy
Introduces full multi-tenant organization model to AgentIdP:

Schema:
- 6 migrations: organizations + organization_members tables; organization_id FK
  added to agents, credentials, audit_logs; PostgreSQL RLS policies on all three
  tables; system org seed + backfill

API:
- 6 new /api/v1/organizations endpoints (CRUD + members) gated by admin:orgs scope
- OPA scopes.json updated with 6 new org endpoint → admin:orgs mappings

Implementation:
- OrgRepository, OrgService, OrgController, createOrgsRouter
- OrgContextMiddleware: sets app.organization_id session variable so RLS enforces
  per-request org isolation at the database layer
- JWT payload extended with organization_id claim; auth.ts backfills org_system
  for backward-compatible tokens
- New error classes: OrgNotFoundError, OrgHasActiveAgentsError, AlreadyMemberError

Tests: 373 passing, 80.64% branch coverage, zero any types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 00:29:32 +00:00
SentryAgent.ai Developer
cb7d079ef6 feat(openspec): Phase 3 Enterprise — proposal, design, specs, and tasks
Scaffolds the phase-3-enterprise OpenSpec change (proposal only — awaiting CEO
approval before implementation). 6 workstreams, 95 implementation tasks:

WS1: Multi-Tenancy (21 tasks) — org model, RLS, admin API
WS2: W3C DIDs (12 tasks) — DID:WEB, agent DID documents, AGNTCY cards
WS3: OIDC (12 tasks) — oidc-provider, ID tokens, JWKS, discovery
WS4: Federation (11 tasks) — cross-instance trust, JWT assertions
WS5: Webhooks (17 tasks) — subscriptions, Bull queue, HMAC, retry
WS6: SOC2 (22 tasks) — encryption at rest, Merkle audit chain, controls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 12:53:31 +00:00
235 changed files with 33552 additions and 125 deletions

110
.github/actions/issue-token/README.md vendored Normal file
View File

@@ -0,0 +1,110 @@
# sentryagent/issue-token
Issues a SentryAgent.ai OAuth2 Bearer token for an existing agent from a GitHub
Actions workflow.
No long-lived API credentials are required. The action uses a GitHub-issued OIDC
token to authenticate with the SentryAgent.ai AgentIdP via `POST /oidc/token`.
The returned access token is automatically masked with `core.setSecret()` so it
never appears in plaintext in workflow logs.
## Prerequisites
### 1. Register the agent
The agent must already exist in SentryAgent.ai. If you need to create the agent
in CI, use [`sentryagent/register-agent@v1`](../register-agent/README.md) first.
### 2. Configure an OIDC Trust Policy for the agent
A trust policy linking the repository to the specific agent must be registered:
```bash
curl -X POST https://idp.sentryagent.ai/api/v1/oidc/trust-policies \
-H "Authorization: Bearer <your-admin-token>" \
-H "Content-Type: application/json" \
-d '{
"provider": "github",
"repository": "org/your-repo",
"branch": "main",
"agentId": "<agent-uuid>"
}'
```
Omit `branch` to allow any branch to issue tokens for this agent.
### 3. Grant `id-token: write` permission
The workflow must have permission to request a GitHub OIDC token:
```yaml
permissions:
id-token: write
contents: read
```
## Inputs
| Input | Required | Description |
|-------|----------|-------------|
| `api-url` | Yes | Base URL of the SentryAgent.ai API (e.g. `https://idp.sentryagent.ai`) |
| `agent-id` | Yes | UUID of the agent for which to issue an access token |
## Outputs
| Output | Description |
|--------|-------------|
| `access-token` | Short-lived Bearer token. Masked in all log output. |
| `expires-at` | ISO 8601 timestamp indicating when the token expires. |
## Example workflow
```yaml
name: Deploy with Agent Token
on:
push:
branches: [main]
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Issue SentryAgent access token
id: token
uses: sentryagent/issue-token@v1
with:
api-url: https://idp.sentryagent.ai
agent-id: ${{ vars.SENTRY_AGENT_ID }}
- name: Call authenticated API
run: |
curl -H "Authorization: Bearer ${{ steps.token.outputs.access-token }}" \
https://my-service.example.com/deploy
```
## Troubleshooting
**HTTP 403 — Trust policy violation**
No trust policy exists for this repository + agent combination. Register a trust
policy using the Prerequisites steps above.
**HTTP 403 — Branch not permitted**
A trust policy exists but specifies a branch constraint that does not match the
current workflow's branch. Add a policy for the current branch, or remove the
branch constraint to allow all branches.
**Failed to obtain a GitHub OIDC token**
Ensure `id-token: write` is set in the workflow's `permissions` block.
**Token expires too quickly**
The default token TTL is set by the SentryAgent.ai server configuration. Check
`expires-at` and re-issue a token before it expires if your workflow is long-running.
## Full documentation
[https://docs.sentryagent.ai/github-actions](https://docs.sentryagent.ai/github-actions)

153
.github/actions/issue-token/action.js vendored Normal file
View File

@@ -0,0 +1,153 @@
/**
* issue-token GitHub Action script.
*
* Flow:
* 1. Request a GitHub OIDC token via @actions/core.getIDToken()
* 2. Exchange the OIDC token for a SentryAgent.ai access token via POST /oidc/token
* 3. Set outputs: access-token (masked) and expires-at (ISO 8601)
*
* The access token is immediately registered with core.setSecret() so it never
* appears in plaintext in workflow logs.
*
* Error handling:
* - OIDC exchange failures emit a clear message with a link to the trust policy setup docs
*/
'use strict';
const core = require('@actions/core');
const { HttpClient } = require('@actions/http-client');
/**
* Exchanges a GitHub OIDC JWT for a SentryAgent.ai access token for a specific agent.
*
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
* @param {string} oidcToken - GitHub OIDC JWT obtained from core.getIDToken().
* @param {string} agentId - UUID of the agent for which to issue a token.
* @returns {Promise<{ accessToken: string; expiresIn: number }>} The access token and its TTL in seconds.
* @throws {Error} If the exchange fails, with a message including trust policy setup instructions.
*/
async function exchangeOIDCToken(apiUrl, oidcToken, agentId) {
const client = new HttpClient('sentryagent-issue-token/1.0');
const url = `${apiUrl}/api/v1/oidc/token`;
const body = JSON.stringify({
provider: 'github',
token: oidcToken,
agentId,
});
let response;
try {
response = await client.post(url, body, {
'Content-Type': 'application/json',
Accept: 'application/json',
});
} catch (err) {
throw new Error(
`Failed to reach the SentryAgent.ai OIDC token endpoint at ${url}. ` +
`Check that the api-url input is correct and the API is reachable.\n` +
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
);
}
const rawBody = await response.readBody();
const statusCode = response.message.statusCode ?? 0;
if (statusCode === 403) {
throw new Error(
'GitHub OIDC token exchange was rejected with HTTP 403 (Forbidden). ' +
'This usually means no trust policy has been registered for this repository.\n\n' +
'To fix this, register a trust policy by calling:\n' +
` POST ${apiUrl}/oidc/trust-policies\n` +
' Body: { "provider": "github", "repository": "org/repo", "agentId": "<agent-id>" }\n\n' +
'For full setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
if (statusCode < 200 || statusCode >= 300) {
let detail = rawBody;
try {
const parsed = JSON.parse(rawBody);
detail = parsed.message ?? parsed.error_description ?? rawBody;
} catch {
// use rawBody as-is
}
throw new Error(
`OIDC token exchange failed with HTTP ${statusCode}: ${detail}\n` +
'For trust policy setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
let tokenData;
try {
tokenData = JSON.parse(rawBody);
} catch {
throw new Error(`OIDC token exchange returned non-JSON response: ${rawBody}`);
}
if (typeof tokenData.access_token !== 'string' || tokenData.access_token.length === 0) {
throw new Error('OIDC token exchange response did not include an access_token.');
}
const expiresIn = typeof tokenData.expires_in === 'number' ? tokenData.expires_in : 3600;
return { accessToken: tokenData.access_token, expiresIn };
}
/**
* Computes an ISO 8601 expiry timestamp from a TTL in seconds.
*
* @param {number} expiresInSeconds - Number of seconds until the token expires.
* @returns {string} ISO 8601 timestamp string.
*/
function computeExpiresAt(expiresInSeconds) {
return new Date(Date.now() + expiresInSeconds * 1000).toISOString();
}
/**
* Main entry point for the issue-token GitHub Action.
*
* @returns {Promise<void>}
*/
async function run() {
try {
// Read inputs
const apiUrl = core.getInput('api-url', { required: true }).replace(/\/$/, '');
const agentId = core.getInput('agent-id', { required: true });
core.info(`Requesting GitHub OIDC token for audience: ${apiUrl}`);
let oidcToken;
try {
oidcToken = await core.getIDToken(apiUrl);
} catch (err) {
throw new Error(
'Failed to obtain a GitHub OIDC token. ' +
"Ensure the workflow has 'id-token: write' permission in its permissions block.\n\n" +
'Example:\n' +
'permissions:\n' +
' id-token: write\n' +
' contents: read\n\n' +
`Underlying error: ${err instanceof Error ? err.message : String(err)}\n` +
'For setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
core.info(`Exchanging GitHub OIDC token for SentryAgent.ai access token (agent: ${agentId})...`);
const { accessToken, expiresIn } = await exchangeOIDCToken(apiUrl, oidcToken, agentId);
// Mask the token immediately — must happen before any logging or output
core.setSecret(accessToken);
const expiresAt = computeExpiresAt(expiresIn);
core.setOutput('access-token', accessToken);
core.setOutput('expires-at', expiresAt);
core.info(`Access token issued successfully. Expires at: ${expiresAt}`);
} catch (err) {
core.setFailed(err instanceof Error ? err.message : String(err));
}
}
run();

37
.github/actions/issue-token/action.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
name: 'SentryAgent Issue Token'
description: >
Issues a SentryAgent.ai OAuth2 access token for an agent using GitHub OIDC
token exchange. No long-lived API credentials required. The issued access
token is automatically masked in GitHub Actions logs via core.setSecret().
author: 'SentryAgent.ai'
branding:
icon: 'key'
color: 'blue'
inputs:
api-url:
description: >
Base URL of the SentryAgent.ai AgentIdP API.
Example: https://idp.sentryagent.ai
required: true
agent-id:
description: >
The UUID of the agent for which to issue an access token.
Obtain this from the register-agent action output or from the API.
required: true
outputs:
access-token:
description: >
A short-lived Bearer access token for the specified agent.
The token value is masked in all GitHub Actions log output.
expires-at:
description: >
ISO 8601 timestamp indicating when the access token expires.
Use this to decide when to re-issue a fresh token.
runs:
using: 'node20'
main: 'action.js'

View File

@@ -0,0 +1,96 @@
# sentryagent/register-agent
Registers a new AI agent in SentryAgent.ai from a GitHub Actions workflow.
No long-lived API credentials are required. The action uses a GitHub-issued OIDC
token to authenticate with the SentryAgent.ai AgentIdP via `POST /oidc/token`, then
calls `POST /agents` to create the agent.
## Prerequisites
### 1. Configure an OIDC Trust Policy
Before this action can exchange tokens, a trust policy must be registered in
SentryAgent.ai for the repository that will run the workflow.
```bash
curl -X POST https://idp.sentryagent.ai/api/v1/oidc/trust-policies \
-H "Authorization: Bearer <your-admin-token>" \
-H "Content-Type: application/json" \
-d '{
"provider": "github",
"repository": "org/your-repo",
"branch": "main"
}'
```
Omit `branch` to allow any branch to register agents from this repository.
### 2. Grant `id-token: write` permission
The workflow must have permission to request a GitHub OIDC token:
```yaml
permissions:
id-token: write
contents: read
```
## Inputs
| Input | Required | Description |
|-------|----------|-------------|
| `api-url` | Yes | Base URL of the SentryAgent.ai API (e.g. `https://idp.sentryagent.ai`) |
| `agent-name` | Yes | Unique name (email format) for the new agent |
| `agent-description` | No | Human-readable description of the agent's purpose |
## Outputs
| Output | Description |
|--------|-------------|
| `agent-id` | UUID of the newly registered agent. Use in subsequent steps to issue tokens or manage credentials. |
## Example workflow
```yaml
name: Register Agent
on:
workflow_dispatch:
permissions:
id-token: write
contents: read
jobs:
register:
runs-on: ubuntu-latest
steps:
- name: Register SentryAgent
id: register
uses: sentryagent/register-agent@v1
with:
api-url: https://idp.sentryagent.ai
agent-name: my-ci-agent@acme.com
agent-description: CI agent for the acme/my-repo build pipeline
- name: Print agent ID
run: echo "Registered agent ${{ steps.register.outputs.agent-id }}"
```
## Troubleshooting
**HTTP 403 — Trust policy not configured**
Register a trust policy for this repository first. See the Prerequisites section above.
**Failed to obtain a GitHub OIDC token**
Ensure `id-token: write` is set in the workflow's `permissions` block.
**Agent registration failed with HTTP 401**
The OIDC token exchange succeeded but the returned access token was rejected by
`POST /agents`. Check that the SentryAgent.ai API version matches and the
bootstrap token has `agents:write` scope.
## Full documentation
[https://docs.sentryagent.ai/github-actions](https://docs.sentryagent.ai/github-actions)

200
.github/actions/register-agent/action.js vendored Normal file
View File

@@ -0,0 +1,200 @@
/**
* register-agent GitHub Action script.
*
* Flow:
* 1. Request a GitHub OIDC token via @actions/core.getIDToken()
* 2. Exchange the OIDC token for a SentryAgent.ai access token via POST /oidc/token
* 3. Register a new agent via POST /agents using the access token
* 4. Set the `agent-id` output
*
* Error handling:
* - OIDC exchange failures emit a clear message with a link to the trust policy setup docs
* - Agent registration failures surface the API error message
*/
'use strict';
const core = require('@actions/core');
const { HttpClient, BearerCredentialHandler } = require('@actions/http-client');
/**
* Exchanges a GitHub OIDC JWT for a SentryAgent.ai access token.
*
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
* @param {string} oidcToken - GitHub OIDC JWT obtained from core.getIDToken().
* @returns {Promise<string>} The SentryAgent.ai access token.
* @throws {Error} If the exchange fails, with a message including trust policy setup instructions.
*/
async function exchangeOIDCToken(apiUrl, oidcToken) {
const client = new HttpClient('sentryagent-register-agent/1.0');
const url = `${apiUrl}/api/v1/oidc/token`;
const body = JSON.stringify({
provider: 'github',
token: oidcToken,
});
let response;
try {
response = await client.post(url, body, {
'Content-Type': 'application/json',
Accept: 'application/json',
});
} catch (err) {
throw new Error(
`Failed to reach the SentryAgent.ai OIDC token endpoint at ${url}. ` +
`Check that the api-url input is correct and the API is reachable.\n` +
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
);
}
const rawBody = await response.readBody();
const statusCode = response.message.statusCode ?? 0;
if (statusCode === 403) {
throw new Error(
'GitHub OIDC token exchange was rejected with HTTP 403 (Forbidden). ' +
'This usually means no trust policy has been registered for this repository.\n\n' +
'To fix this, register a trust policy by calling:\n' +
` POST ${apiUrl}/oidc/trust-policies\n` +
' Body: { "provider": "github", "repository": "org/repo", "agentId": "<agent-id>" }\n\n' +
'For full setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
if (statusCode < 200 || statusCode >= 300) {
let detail = rawBody;
try {
const parsed = JSON.parse(rawBody);
detail = parsed.message ?? parsed.error_description ?? rawBody;
} catch {
// use rawBody as-is
}
throw new Error(
`OIDC token exchange failed with HTTP ${statusCode}: ${detail}\n` +
'For trust policy setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
let tokenData;
try {
tokenData = JSON.parse(rawBody);
} catch {
throw new Error(`OIDC token exchange returned non-JSON response: ${rawBody}`);
}
if (typeof tokenData.access_token !== 'string' || tokenData.access_token.length === 0) {
throw new Error('OIDC token exchange response did not include an access_token.');
}
return tokenData.access_token;
}
/**
* Registers a new agent via POST /agents.
*
* @param {string} apiUrl - Base URL of the SentryAgent.ai AgentIdP API.
* @param {string} accessToken - A valid SentryAgent.ai Bearer access token.
* @param {string} agentName - Email (unique name) for the new agent.
* @param {string} agentDescription - Optional description stored as the owner field.
* @returns {Promise<string>} The UUID of the newly registered agent.
* @throws {Error} If the API returns a non-2xx response.
*/
async function registerAgent(apiUrl, accessToken, agentName, agentDescription) {
const auth = new BearerCredentialHandler(accessToken);
const client = new HttpClient('sentryagent-register-agent/1.0', [auth]);
const url = `${apiUrl}/api/v1/agents`;
const payload = {
email: agentName,
agentType: 'custom',
version: '1.0.0',
capabilities: [],
owner: agentDescription || agentName,
deploymentEnv: 'production',
};
let response;
try {
response = await client.post(url, JSON.stringify(payload), {
'Content-Type': 'application/json',
Accept: 'application/json',
});
} catch (err) {
throw new Error(
`Failed to reach the SentryAgent.ai agents endpoint at ${url}.\n` +
`Underlying error: ${err instanceof Error ? err.message : String(err)}`,
);
}
const rawBody = await response.readBody();
const statusCode = response.message.statusCode ?? 0;
if (statusCode < 200 || statusCode >= 300) {
let detail = rawBody;
try {
const parsed = JSON.parse(rawBody);
detail = parsed.message ?? parsed.error ?? rawBody;
} catch {
// use rawBody as-is
}
throw new Error(`Agent registration failed with HTTP ${statusCode}: ${detail}`);
}
let agentData;
try {
agentData = JSON.parse(rawBody);
} catch {
throw new Error(`Agent registration returned non-JSON response: ${rawBody}`);
}
if (typeof agentData.agentId !== 'string' || agentData.agentId.length === 0) {
throw new Error('Agent registration response did not include an agentId.');
}
return agentData.agentId;
}
/**
* Main entry point for the register-agent GitHub Action.
*
* @returns {Promise<void>}
*/
async function run() {
try {
// Read inputs
const apiUrl = core.getInput('api-url', { required: true }).replace(/\/$/, '');
const agentName = core.getInput('agent-name', { required: true });
const agentDescription = core.getInput('agent-description') || '';
core.info(`Requesting GitHub OIDC token for audience: ${apiUrl}`);
let oidcToken;
try {
oidcToken = await core.getIDToken(apiUrl);
} catch (err) {
throw new Error(
'Failed to obtain a GitHub OIDC token. ' +
"Ensure the workflow has 'id-token: write' permission in its permissions block.\n\n" +
'Example:\n' +
'permissions:\n' +
' id-token: write\n' +
' contents: read\n\n' +
`Underlying error: ${err instanceof Error ? err.message : String(err)}\n` +
'For setup instructions, visit: https://docs.sentryagent.ai/github-actions#trust-policy',
);
}
core.info('Exchanging GitHub OIDC token for SentryAgent.ai access token...');
const accessToken = await exchangeOIDCToken(apiUrl, oidcToken);
core.info(`Registering agent: ${agentName}`);
const agentId = await registerAgent(apiUrl, accessToken, agentName, agentDescription);
core.setOutput('agent-id', agentId);
core.info(`Agent registered successfully. agent-id: ${agentId}`);
} catch (err) {
core.setFailed(err instanceof Error ? err.message : String(err));
}
}
run();

View File

@@ -0,0 +1,39 @@
name: 'SentryAgent Register Agent'
description: >
Registers a new agent in SentryAgent.ai using GitHub OIDC token exchange.
No long-lived API credentials required — the GitHub Actions OIDC token is
exchanged for a short-lived SentryAgent.ai access token to call POST /agents.
author: 'SentryAgent.ai'
branding:
icon: 'shield'
color: 'blue'
inputs:
api-url:
description: >
Base URL of the SentryAgent.ai AgentIdP API.
Example: https://idp.sentryagent.ai
required: true
agent-name:
description: >
Unique name (email) for the agent being registered.
Must be a valid email address format used as the agent identity.
required: true
agent-description:
description: >
Optional human-readable description of the agent's purpose.
Stored as the agent owner field.
required: false
default: ''
outputs:
agent-id:
description: >
The UUID of the newly registered agent.
Use in subsequent steps to issue tokens or manage credentials.
runs:
using: 'node20'
main: 'action.js'

4
.gitignore vendored
View File

@@ -5,3 +5,7 @@ coverage/
.env.* .env.*
*.log *.log
.DS_Store .DS_Store
# Next.js build output
portal/.next/
portal/node_modules/

348
cli/README.md Normal file
View File

@@ -0,0 +1,348 @@
# sentryagent CLI
The official command-line interface for [SentryAgent.ai](https://sentryagent.ai) — manage agents, issue OAuth2 tokens, rotate credentials, and stream audit logs from your terminal.
---
## Installation
### From npm (once published)
```bash
npm install -g sentryagent
```
### From source
```bash
cd cli/
npm install
npm run build
npm install -g .
```
---
## Configuration
Before using any command, configure the CLI with your API endpoint and credentials:
```bash
sentryagent configure
```
You will be prompted for:
| Field | Description |
|---------------|--------------------------------------------------|
| API URL | The SentryAgent.ai API base URL (e.g. `https://api.sentryagent.ai`) |
| Client ID | Your tenant client ID |
| Client Secret | Your tenant client secret |
Configuration is stored at `~/.sentryagent/config.json` with permissions `0600`.
If any command is run before `sentryagent configure` has been called, the CLI exits with:
```
Not configured. Run `sentryagent configure` first.
```
---
## Commands
### `sentryagent --version` / `-v`
Output the installed CLI version.
```bash
sentryagent --version
# 1.0.0
```
### `sentryagent --help` / `-h`
Show all available commands and global options.
```bash
sentryagent --help
```
---
### `sentryagent configure`
Interactively configure the CLI.
```bash
sentryagent configure
```
**Prompts:**
```
SentryAgent CLI Configuration
────────────────────────────────────────
API URL (e.g. https://api.sentryagent.ai): https://api.sentryagent.ai
Client ID: tenant_01ABC...
Client Secret: ****
✓ Configuration saved to ~/.sentryagent/config.json
```
---
### `sentryagent register-agent`
Register a new agent with the identity provider.
```bash
sentryagent register-agent --name <name> [--description <desc>]
```
**Options:**
| Flag | Required | Description |
|-------------------|----------|---------------------|
| `--name <name>` | Yes | Agent display name |
| `--description` | No | Agent description |
**Example:**
```bash
sentryagent register-agent --name "billing-agent" --description "Handles billing workflows"
```
**Output:**
```
✓ Agent registered successfully
Agent ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
Name: billing-agent
Description: Handles billing workflows
Status: active
```
---
### `sentryagent list-agents`
List all agents registered for your tenant, displayed as a formatted table.
```bash
sentryagent list-agents
```
**Output:**
```
AGENT ID NAME STATUS CREATED AT
────────────────────────────────────────────────────────────────────────────
01ARZ3NDEKTSV4RRFFQ69G5FAV billing-agent active 4/2/2026, 9:00:00 AM
01ARZ3NDEKTSV4RRFFQ69G5FAX auth-agent active 4/1/2026, 3:00:00 PM
────────────────────────────────────────────────────────────────────────────
Total: 2
```
---
### `sentryagent issue-token`
Issue an OAuth2 `client_credentials` access token for a specific agent.
```bash
sentryagent issue-token --agent-id <id>
```
**Options:**
| Flag | Required | Description |
|--------------------|----------|-------------------------|
| `--agent-id <id>` | Yes | Target agent ID |
**Example:**
```bash
sentryagent issue-token --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
```
**Output:**
```
✓ Token issued successfully
Access Token:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
Token Type: Bearer
Expires In: 3600s
Expires At: 2026-04-02T10:00:00.000Z
```
---
### `sentryagent rotate-credentials`
Rotate the client secret for an agent. Prompts for confirmation before proceeding.
```bash
sentryagent rotate-credentials --agent-id <id>
```
**Options:**
| Flag | Required | Description |
|--------------------|----------|-------------------------|
| `--agent-id <id>` | Yes | Target agent ID |
**Example:**
```bash
sentryagent rotate-credentials --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
```
**Output:**
```
⚠ This will invalidate the current secret for agent 01ARZ3NDEKTSV4RRFFQ69G5FAV
This will invalidate the current secret. Continue? [y/N] y
✓ Credentials rotated successfully
Client ID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
Client Secret: cs_new_secret_value_here
Store the new client secret securely — it will not be shown again.
```
---
### `sentryagent tail-audit-log`
Poll the audit log API every 5 seconds and stream new events to stdout. Press **Ctrl+C** to stop.
```bash
sentryagent tail-audit-log [--agent-id <id>]
```
**Options:**
| Flag | Required | Description |
|--------------------|----------|------------------------------------|
| `--agent-id <id>` | No | Filter events for a specific agent |
**Example (all events):**
```bash
sentryagent tail-audit-log
```
**Example (filtered by agent):**
```bash
sentryagent tail-audit-log --agent-id 01ARZ3NDEKTSV4RRFFQ69G5FAV
```
**Output:**
```
Tailing audit log — press Ctrl+C to stop
────────────────────────────────────────────────────────────
4/2/2026, 9:05:00 AM agent.token.issued outcome=success agent=01ARZ3NDEKTSV... id=evt_01...
4/2/2026, 9:10:03 AM agent.registered outcome=success id=evt_02...
^C
Stopped.
```
---
### `sentryagent completion`
Output shell completion scripts.
#### Bash
```bash
sentryagent completion bash
```
To enable permanently, add to `~/.bashrc` or `~/.bash_profile`:
```bash
source <(sentryagent completion bash)
```
Or write to a file:
```bash
sentryagent completion bash > ~/.bash_completion.d/sentryagent
```
#### Zsh
```bash
sentryagent completion zsh
```
To enable permanently, add to `~/.zshrc`:
```bash
source <(sentryagent completion zsh)
```
Or write to a file in your `$fpath`:
```bash
sentryagent completion zsh > ~/.zsh/completions/_sentryagent
```
---
## Shell Completion Setup
### Bash (one-time setup)
```bash
mkdir -p ~/.bash_completion.d
sentryagent completion bash > ~/.bash_completion.d/sentryagent
echo 'source ~/.bash_completion.d/sentryagent' >> ~/.bashrc
source ~/.bashrc
```
### Zsh (one-time setup)
```bash
mkdir -p ~/.zsh/completions
sentryagent completion zsh > ~/.zsh/completions/_sentryagent
echo 'fpath=(~/.zsh/completions $fpath)' >> ~/.zshrc
echo 'autoload -Uz compinit && compinit' >> ~/.zshrc
source ~/.zshrc
```
After setup, pressing **Tab** after `sentryagent` will autocomplete commands and flags.
---
## Configuration File
The config file is stored at `~/.sentryagent/config.json`:
```json
{
"apiUrl": "https://api.sentryagent.ai",
"clientId": "tenant_01ABC...",
"clientSecret": "cs_secret_value"
}
```
The directory is created with mode `0700` and the file with mode `0600` to prevent other users from reading your credentials.
---
## Environment
- Node.js >= 18.0.0 is required (uses the built-in `fetch` API)
- All HTTP requests use OAuth2 `client_credentials` tokens fetched automatically from your configuration
- Tokens are cached in memory for the duration of the CLI session (refreshed 30 seconds before expiry)

267
cli/package-lock.json generated Normal file
View File

@@ -0,0 +1,267 @@
{
"name": "sentryagent",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "sentryagent",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"chalk": "^5.3.0",
"commander": "^12.1.0"
},
"bin": {
"sentryagent": "dist/index.js"
},
"devDependencies": {
"@types/node": "^20.12.7",
"ts-node": "^10.9.2",
"typescript": "^5.4.5"
},
"engines": {
"node": ">=18.0.0"
}
},
"node_modules/@cspotcode/source-map-support": {
"version": "0.8.1",
"resolved": "https://registry.npmjs.org/@cspotcode/source-map-support/-/source-map-support-0.8.1.tgz",
"integrity": "sha512-IchNf6dN4tHoMFIn/7OE8LWZ19Y6q/67Bmf6vnGREv8RSbBVb9LPJxEcnwrcwX6ixSvaiGoomAUvu4YSxXrVgw==",
"dev": true,
"license": "MIT",
"dependencies": {
"@jridgewell/trace-mapping": "0.3.9"
},
"engines": {
"node": ">=12"
}
},
"node_modules/@jridgewell/resolve-uri": {
"version": "3.1.2",
"resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz",
"integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/@jridgewell/sourcemap-codec": {
"version": "1.5.5",
"resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz",
"integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==",
"dev": true,
"license": "MIT"
},
"node_modules/@jridgewell/trace-mapping": {
"version": "0.3.9",
"resolved": "https://registry.npmjs.org/@jridgewell/trace-mapping/-/trace-mapping-0.3.9.tgz",
"integrity": "sha512-3Belt6tdc8bPgAtbcmdtNJlirVoTmEb5e2gC94PnkwEW9jI6CAHUeoG85tjWP5WquqfavoMtMwiG4P926ZKKuQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@jridgewell/resolve-uri": "^3.0.3",
"@jridgewell/sourcemap-codec": "^1.4.10"
}
},
"node_modules/@tsconfig/node10": {
"version": "1.0.12",
"resolved": "https://registry.npmjs.org/@tsconfig/node10/-/node10-1.0.12.tgz",
"integrity": "sha512-UCYBaeFvM11aU2y3YPZ//O5Rhj+xKyzy7mvcIoAjASbigy8mHMryP5cK7dgjlz2hWxh1g5pLw084E0a/wlUSFQ==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node12": {
"version": "1.0.11",
"resolved": "https://registry.npmjs.org/@tsconfig/node12/-/node12-1.0.11.tgz",
"integrity": "sha512-cqefuRsh12pWyGsIoBKJA9luFu3mRxCA+ORZvA4ktLSzIuCUtWVxGIuXigEwO5/ywWFMZ2QEGKWvkZG1zDMTag==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node14": {
"version": "1.0.3",
"resolved": "https://registry.npmjs.org/@tsconfig/node14/-/node14-1.0.3.tgz",
"integrity": "sha512-ysT8mhdixWK6Hw3i1V2AeRqZ5WfXg1G43mqoYlM2nc6388Fq5jcXyr5mRsqViLx/GJYdoL0bfXD8nmF+Zn/Iow==",
"dev": true,
"license": "MIT"
},
"node_modules/@tsconfig/node16": {
"version": "1.0.4",
"resolved": "https://registry.npmjs.org/@tsconfig/node16/-/node16-1.0.4.tgz",
"integrity": "sha512-vxhUy4J8lyeyinH7Azl1pdd43GJhZH/tP2weN8TntQblOY+A0XbT8DJk1/oCPuOOyg/Ja757rG0CgHcWC8OfMA==",
"dev": true,
"license": "MIT"
},
"node_modules/@types/node": {
"version": "20.19.37",
"resolved": "https://registry.npmjs.org/@types/node/-/node-20.19.37.tgz",
"integrity": "sha512-8kzdPJ3FsNsVIurqBs7oodNnCEVbni9yUEkaHbgptDACOPW04jimGagZ51E6+lXUwJjgnBw+hyko/lkFWCldqw==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~6.21.0"
}
},
"node_modules/acorn": {
"version": "8.16.0",
"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.16.0.tgz",
"integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==",
"dev": true,
"license": "MIT",
"bin": {
"acorn": "bin/acorn"
},
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/acorn-walk": {
"version": "8.3.5",
"resolved": "https://registry.npmjs.org/acorn-walk/-/acorn-walk-8.3.5.tgz",
"integrity": "sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==",
"dev": true,
"license": "MIT",
"dependencies": {
"acorn": "^8.11.0"
},
"engines": {
"node": ">=0.4.0"
}
},
"node_modules/arg": {
"version": "4.1.3",
"resolved": "https://registry.npmjs.org/arg/-/arg-4.1.3.tgz",
"integrity": "sha512-58S9QDqG0Xx27YwPSt9fJxivjYl432YCwfDMfZ+71RAqUrZef7LrKQZ3LHLOwCS4FLNBplP533Zx895SeOCHvA==",
"dev": true,
"license": "MIT"
},
"node_modules/chalk": {
"version": "5.6.2",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-5.6.2.tgz",
"integrity": "sha512-7NzBL0rN6fMUW+f7A6Io4h40qQlG+xGmtMxfbnH/K7TAtt8JQWVQK+6g0UXKMeVJoyV5EkkNsErQ8pVD3bLHbA==",
"license": "MIT",
"engines": {
"node": "^12.17.0 || ^14.13 || >=16.0.0"
},
"funding": {
"url": "https://github.com/chalk/chalk?sponsor=1"
}
},
"node_modules/commander": {
"version": "12.1.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-12.1.0.tgz",
"integrity": "sha512-Vw8qHK3bZM9y/P10u3Vib8o/DdkvA2OtPtZvD871QKjy74Wj1WSKFILMPRPSdUSx5RFK1arlJzEtA4PkFgnbuA==",
"license": "MIT",
"engines": {
"node": ">=18"
}
},
"node_modules/create-require": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/create-require/-/create-require-1.1.1.tgz",
"integrity": "sha512-dcKFX3jn0MpIaXjisoRvexIJVEKzaq7z2rZKxf+MSr9TkdmHmsU4m2lcLojrj/FHl8mk5VxMmYA+ftRkP/3oKQ==",
"dev": true,
"license": "MIT"
},
"node_modules/diff": {
"version": "4.0.4",
"resolved": "https://registry.npmjs.org/diff/-/diff-4.0.4.tgz",
"integrity": "sha512-X07nttJQkwkfKfvTPG/KSnE2OMdcUCao6+eXF3wmnIQRn2aPAHH3VxDbDOdegkd6JbPsXqShpvEOHfAT+nCNwQ==",
"dev": true,
"license": "BSD-3-Clause",
"engines": {
"node": ">=0.3.1"
}
},
"node_modules/make-error": {
"version": "1.3.6",
"resolved": "https://registry.npmjs.org/make-error/-/make-error-1.3.6.tgz",
"integrity": "sha512-s8UhlNe7vPKomQhC1qFelMokr/Sc3AgNbso3n74mVPA5LTZwkB9NlXf4XPamLxJE8h0gh73rM94xvwRT2CVInw==",
"dev": true,
"license": "ISC"
},
"node_modules/ts-node": {
"version": "10.9.2",
"resolved": "https://registry.npmjs.org/ts-node/-/ts-node-10.9.2.tgz",
"integrity": "sha512-f0FFpIdcHgn8zcPSbf1dRevwt047YMnaiJM3u2w2RewrB+fob/zePZcrOyQoLMMO7aBIddLcQIEK5dYjkLnGrQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@cspotcode/source-map-support": "^0.8.0",
"@tsconfig/node10": "^1.0.7",
"@tsconfig/node12": "^1.0.7",
"@tsconfig/node14": "^1.0.0",
"@tsconfig/node16": "^1.0.2",
"acorn": "^8.4.1",
"acorn-walk": "^8.1.1",
"arg": "^4.1.0",
"create-require": "^1.1.0",
"diff": "^4.0.1",
"make-error": "^1.1.1",
"v8-compile-cache-lib": "^3.0.1",
"yn": "3.1.1"
},
"bin": {
"ts-node": "dist/bin.js",
"ts-node-cwd": "dist/bin-cwd.js",
"ts-node-esm": "dist/bin-esm.js",
"ts-node-script": "dist/bin-script.js",
"ts-node-transpile-only": "dist/bin-transpile.js",
"ts-script": "dist/bin-script-deprecated.js"
},
"peerDependencies": {
"@swc/core": ">=1.2.50",
"@swc/wasm": ">=1.2.50",
"@types/node": "*",
"typescript": ">=2.7"
},
"peerDependenciesMeta": {
"@swc/core": {
"optional": true
},
"@swc/wasm": {
"optional": true
}
}
},
"node_modules/typescript": {
"version": "5.9.3",
"resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
"dev": true,
"license": "Apache-2.0",
"bin": {
"tsc": "bin/tsc",
"tsserver": "bin/tsserver"
},
"engines": {
"node": ">=14.17"
}
},
"node_modules/undici-types": {
"version": "6.21.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
"integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
"dev": true,
"license": "MIT"
},
"node_modules/v8-compile-cache-lib": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/v8-compile-cache-lib/-/v8-compile-cache-lib-3.0.1.tgz",
"integrity": "sha512-wa7YjyUGfNZngI/vtK0UHAN+lgDCxBPCylVXGp0zu59Fz5aiGtNXaq3DhIov063MorB+VfufLh3JlF2KdTK3xg==",
"dev": true,
"license": "MIT"
},
"node_modules/yn": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/yn/-/yn-3.1.1.tgz",
"integrity": "sha512-Ux4ygGWsu2c7isFWe8Yu1YluJmqVhxqK2cLXNQA5AcC3QfbGNpM7fu0Y8b/z16pXLnFxZYvWhd3fhBY9DLmC6Q==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6"
}
}
}
}

34
cli/package.json Normal file
View File

@@ -0,0 +1,34 @@
{
"name": "sentryagent",
"version": "1.0.0",
"description": "SentryAgent.ai CLI — manage agents, tokens, and audit logs",
"main": "dist/index.js",
"bin": {
"sentryagent": "./dist/index.js"
},
"scripts": {
"build": "tsc",
"dev": "ts-node src/index.ts",
"clean": "rm -rf dist"
},
"dependencies": {
"chalk": "^5.3.0",
"commander": "^12.1.0"
},
"devDependencies": {
"@types/node": "^20.12.7",
"typescript": "^5.4.5",
"ts-node": "^10.9.2"
},
"engines": {
"node": ">=18.0.0"
},
"keywords": [
"sentryagent",
"agentidp",
"cli",
"agents",
"identity"
],
"license": "MIT"
}

95
cli/src/api.ts Normal file
View File

@@ -0,0 +1,95 @@
import { Config } from './config';
interface TokenCache {
accessToken: string;
expiresAt: number;
}
let tokenCache: TokenCache | null = null;
interface TokenResponse {
access_token: string;
expires_in: number;
token_type: string;
}
async function fetchToken(config: Config): Promise<string> {
const now = Date.now();
if (tokenCache !== null && tokenCache.expiresAt > now + 30_000) {
return tokenCache.accessToken;
}
const body = new URLSearchParams({
grant_type: 'client_credentials',
client_id: config.clientId,
client_secret: config.clientSecret,
});
const res = await fetch(`${config.apiUrl}/oauth2/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: body.toString(),
});
if (!res.ok) {
const text = await res.text();
throw new Error(`Authentication failed (${res.status}): ${text}`);
}
const data = (await res.json()) as TokenResponse;
tokenCache = {
accessToken: data.access_token,
expiresAt: now + data.expires_in * 1000,
};
return tokenCache.accessToken;
}
export function clearTokenCache(): void {
tokenCache = null;
}
type HttpMethod = 'GET' | 'POST' | 'PUT' | 'PATCH' | 'DELETE';
interface ApiRequestOptions {
method?: HttpMethod;
body?: unknown;
params?: Record<string, string>;
}
export async function apiRequest<T>(
config: Config,
endpoint: string,
options: ApiRequestOptions = {},
): Promise<T> {
const token = await fetchToken(config);
const { method = 'GET', body, params } = options;
let url = `${config.apiUrl}${endpoint}`;
if (params !== undefined && Object.keys(params).length > 0) {
const qs = new URLSearchParams(params);
url = `${url}?${qs.toString()}`;
}
const headers: Record<string, string> = {
Authorization: `Bearer ${token}`,
'Content-Type': 'application/json',
};
const fetchOptions: RequestInit = { method, headers };
if (body !== undefined) {
fetchOptions.body = JSON.stringify(body);
}
const res = await fetch(url, fetchOptions);
if (!res.ok) {
const text = await res.text();
throw new Error(`API error (${res.status}): ${text}`);
}
if (res.status === 204) {
return undefined as unknown as T;
}
return (await res.json()) as T;
}

View File

@@ -0,0 +1,155 @@
import { Command } from 'commander';
const BASH_COMPLETION = `
# sentryagent bash completion
# Add to ~/.bashrc or ~/.bash_profile:
# source <(sentryagent completion bash)
_sentryagent_completion() {
local cur prev words cword
_init_completion || return
local commands="configure register-agent list-agents issue-token rotate-credentials tail-audit-log completion"
local global_opts="--help --version"
case "\${prev}" in
sentryagent)
COMPREPLY=( \$(compgen -W "\${commands} \${global_opts}" -- "\${cur}") )
return 0
;;
configure)
COMPREPLY=( \$(compgen -W "--help" -- "\${cur}") )
return 0
;;
register-agent)
COMPREPLY=( \$(compgen -W "--name --description --help" -- "\${cur}") )
return 0
;;
list-agents)
COMPREPLY=( \$(compgen -W "--help" -- "\${cur}") )
return 0
;;
issue-token)
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
return 0
;;
rotate-credentials)
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
return 0
;;
tail-audit-log)
COMPREPLY=( \$(compgen -W "--agent-id --help" -- "\${cur}") )
return 0
;;
completion)
COMPREPLY=( \$(compgen -W "bash zsh --help" -- "\${cur}") )
return 0
;;
*)
COMPREPLY=()
return 0
;;
esac
}
complete -F _sentryagent_completion sentryagent
`.trim();
const ZSH_COMPLETION = `
#compdef sentryagent
# sentryagent zsh completion
# Add to ~/.zshrc:
# source <(sentryagent completion zsh)
# Or generate a file and place it in your $fpath:
# sentryagent completion zsh > ~/.zsh/completions/_sentryagent
_sentryagent() {
local state
_arguments \\
'(-v --version)'{-v,--version}'[Show version]' \\
'(-h --help)'{-h,--help}'[Show help]' \\
'1: :->command' \\
'*: :->args'
case \$state in
command)
local commands=(
'configure:Configure CLI with API URL and credentials'
'register-agent:Register a new agent'
'list-agents:List all registered agents'
'issue-token:Issue an OAuth2 access token for an agent'
'rotate-credentials:Rotate credentials for an agent'
'tail-audit-log:Poll and stream audit log events'
'completion:Output shell completion script'
)
_describe 'command' commands
;;
args)
case \${words[2]} in
configure)
_arguments \\
'(-h --help)'{-h,--help}'[Show help]'
;;
register-agent)
_arguments \\
'--name[Agent name]:name' \\
'--description[Agent description]:description' \\
'(-h --help)'{-h,--help}'[Show help]'
;;
list-agents)
_arguments \\
'(-h --help)'{-h,--help}'[Show help]'
;;
issue-token)
_arguments \\
'--agent-id[Agent ID]:agent-id' \\
'(-h --help)'{-h,--help}'[Show help]'
;;
rotate-credentials)
_arguments \\
'--agent-id[Agent ID]:agent-id' \\
'(-h --help)'{-h,--help}'[Show help]'
;;
tail-audit-log)
_arguments \\
'--agent-id[Filter by agent ID]:agent-id' \\
'(-h --help)'{-h,--help}'[Show help]'
;;
completion)
local shells=('bash:Generate bash completion script' 'zsh:Generate zsh completion script')
_describe 'shell' shells
;;
esac
;;
esac
}
_sentryagent "\$@"
`.trim();
export function registerCompletion(program: Command): void {
const completion = program
.command('completion')
.description('Output shell completion scripts');
completion
.command('bash')
.description('Output bash completion script')
.action(() => {
console.log(BASH_COMPLETION);
});
completion
.command('zsh')
.description('Output zsh completion script')
.action(() => {
console.log(ZSH_COMPLETION);
});
completion.addHelpText(
'after',
'\nSupported shells: bash, zsh',
);
}

View File

@@ -0,0 +1,63 @@
import * as readline from 'readline';
import { Command } from 'commander';
import chalk from 'chalk';
import { writeConfig } from '../config';
function prompt(rl: readline.Interface, question: string): Promise<string> {
return new Promise((resolve) => {
rl.question(question, (answer) => {
resolve(answer.trim());
});
});
}
export function registerConfigure(program: Command): void {
program
.command('configure')
.description('Configure the CLI with API URL and credentials')
.action(async () => {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
try {
console.log(chalk.bold('SentryAgent CLI Configuration'));
console.log(chalk.dim('─'.repeat(40)));
const apiUrl = await prompt(
rl,
chalk.cyan('API URL') + ' (e.g. https://api.sentryagent.ai): ',
);
if (apiUrl === '') {
console.error(chalk.red('API URL cannot be empty.'));
process.exit(1);
}
const clientId = await prompt(rl, chalk.cyan('Client ID') + ': ');
if (clientId === '') {
console.error(chalk.red('Client ID cannot be empty.'));
process.exit(1);
}
const clientSecret = await prompt(
rl,
chalk.cyan('Client Secret') + ': ',
);
if (clientSecret === '') {
console.error(chalk.red('Client Secret cannot be empty.'));
process.exit(1);
}
writeConfig({ apiUrl, clientId, clientSecret });
console.log();
console.log(
chalk.green('✓') +
' Configuration saved to ~/.sentryagent/config.json',
);
} finally {
rl.close();
}
});
}

View File

@@ -0,0 +1,70 @@
import { Command } from 'commander';
import chalk from 'chalk';
import { requireConfig } from '../config';
interface TokenResponse {
access_token: string;
expires_in: number;
token_type: string;
scope?: string;
}
export function registerIssueToken(program: Command): void {
program
.command('issue-token')
.description('Issue an OAuth2 access token for an agent')
.requiredOption('--agent-id <id>', 'Agent ID to issue a token for')
.action(async (options: { agentId: string }) => {
const config = requireConfig();
try {
const body = new URLSearchParams({
grant_type: 'client_credentials',
client_id: config.clientId,
client_secret: config.clientSecret,
agent_id: options.agentId,
});
const res = await fetch(`${config.apiUrl}/oauth2/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: body.toString(),
});
if (!res.ok) {
const text = await res.text();
throw new Error(`Token issuance failed (${res.status}): ${text}`);
}
const data = (await res.json()) as TokenResponse;
const expiresAt = new Date(
Date.now() + data.expires_in * 1000,
).toISOString();
console.log(chalk.green('✓') + ' Token issued successfully');
console.log();
console.log(chalk.bold('Access Token:'));
console.log(chalk.cyan(data.access_token));
console.log();
console.log(
chalk.bold('Token Type: ') + data.token_type,
);
console.log(
chalk.bold('Expires In: ') + `${data.expires_in}s`,
);
console.log(
chalk.bold('Expires At: ') + chalk.dim(expiresAt),
);
if (data.scope !== undefined) {
console.log(chalk.bold('Scope: ') + data.scope);
}
} catch (err) {
console.error(
chalk.red('Error:'),
err instanceof Error ? err.message : String(err),
);
process.exit(1);
}
});
}

View File

@@ -0,0 +1,105 @@
import { Command } from 'commander';
import chalk from 'chalk';
import { requireConfig } from '../config';
import { apiRequest } from '../api';
interface Agent {
id: string;
name: string;
status: string;
createdAt: string;
description?: string;
}
interface AgentsResponse {
agents: Agent[];
total?: number;
}
function truncate(str: string, maxLen: number): string {
if (str.length <= maxLen) return str;
return str.slice(0, maxLen - 1) + '…';
}
function padEnd(str: string, len: number): string {
return str.padEnd(len, ' ');
}
export function registerListAgents(program: Command): void {
program
.command('list-agents')
.description('List all registered agents')
.action(async () => {
const config = requireConfig();
try {
const data = await apiRequest<AgentsResponse | Agent[]>(
config,
'/agents',
);
const agents: Agent[] = Array.isArray(data)
? data
: (data as AgentsResponse).agents ?? [];
if (agents.length === 0) {
console.log(chalk.yellow('No agents found.'));
return;
}
const ID_W = 26;
const NAME_W = 24;
const STATUS_W = 10;
const DATE_W = 20;
const header =
chalk.bold(padEnd('AGENT ID', ID_W)) +
' ' +
chalk.bold(padEnd('NAME', NAME_W)) +
' ' +
chalk.bold(padEnd('STATUS', STATUS_W)) +
' ' +
chalk.bold('CREATED AT');
const divider = chalk.dim(
'─'.repeat(ID_W + NAME_W + STATUS_W + DATE_W + 6),
);
console.log(header);
console.log(divider);
for (const agent of agents) {
const statusColor =
agent.status === 'active'
? chalk.green
: agent.status === 'inactive'
? chalk.yellow
: chalk.red;
const createdAt = new Date(agent.createdAt).toLocaleString();
console.log(
chalk.cyan(padEnd(truncate(agent.id, ID_W), ID_W)) +
' ' +
padEnd(truncate(agent.name, NAME_W), NAME_W) +
' ' +
statusColor(padEnd(truncate(agent.status, STATUS_W), STATUS_W)) +
' ' +
chalk.dim(truncate(createdAt, DATE_W)),
);
}
console.log(divider);
const total = Array.isArray(data)
? agents.length
: ((data as AgentsResponse).total ?? agents.length);
console.log(chalk.dim(`Total: ${total}`));
} catch (err) {
console.error(
chalk.red('Error:'),
err instanceof Error ? err.message : String(err),
);
process.exit(1);
}
});
}

View File

@@ -0,0 +1,54 @@
import { Command } from 'commander';
import chalk from 'chalk';
import { requireConfig } from '../config';
import { apiRequest } from '../api';
interface AgentResponse {
id: string;
name: string;
description?: string;
status: string;
createdAt: string;
}
export function registerRegisterAgent(program: Command): void {
program
.command('register-agent')
.description('Register a new agent')
.requiredOption('--name <name>', 'Agent name')
.option('--description <desc>', 'Agent description')
.action(async (options: { name: string; description?: string }) => {
const config = requireConfig();
try {
const body: { name: string; description?: string } = {
name: options.name,
};
if (options.description !== undefined) {
body.description = options.description;
}
const agent = await apiRequest<AgentResponse>(config, '/agents', {
method: 'POST',
body,
});
console.log(chalk.green('✓') + ' Agent registered successfully');
console.log();
console.log(
chalk.bold('Agent ID: ') + chalk.cyan(agent.id),
);
console.log(chalk.bold('Name: ') + agent.name);
if (agent.description !== undefined) {
console.log(chalk.bold('Description:') + ' ' + agent.description);
}
console.log(chalk.bold('Status: ') + agent.status);
} catch (err) {
console.error(
chalk.red('Error:'),
err instanceof Error ? err.message : String(err),
);
process.exit(1);
}
});
}

View File

@@ -0,0 +1,85 @@
import * as readline from 'readline';
import { Command } from 'commander';
import chalk from 'chalk';
import { requireConfig } from '../config';
import { apiRequest } from '../api';
interface RotateResponse {
clientId: string;
clientSecret: string;
rotatedAt?: string;
}
function prompt(rl: readline.Interface, question: string): Promise<string> {
return new Promise((resolve) => {
rl.question(question, (answer) => {
resolve(answer.trim());
});
});
}
export function registerRotateCredentials(program: Command): void {
program
.command('rotate-credentials')
.description('Rotate credentials for an agent (invalidates current secret)')
.requiredOption('--agent-id <id>', 'Agent ID whose credentials to rotate')
.action(async (options: { agentId: string }) => {
const config = requireConfig();
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
try {
console.log(
chalk.yellow('⚠') +
' This will invalidate the current secret for agent ' +
chalk.cyan(options.agentId),
);
const answer = await prompt(
rl,
chalk.bold('This will invalidate the current secret. Continue? [y/N] '),
);
if (answer.toLowerCase() !== 'y' && answer.toLowerCase() !== 'yes') {
console.log(chalk.dim('Aborted.'));
return;
}
const data = await apiRequest<RotateResponse>(
config,
`/agents/${options.agentId}/credentials/rotate`,
{ method: 'POST' },
);
console.log();
console.log(chalk.green('✓') + ' Credentials rotated successfully');
console.log();
console.log(chalk.bold('Client ID: ') + chalk.cyan(data.clientId));
console.log(
chalk.bold('Client Secret: ') + chalk.yellow(data.clientSecret),
);
console.log();
console.log(
chalk.dim(
'Store the new client secret securely — it will not be shown again.',
),
);
if (data.rotatedAt !== undefined) {
console.log(
chalk.dim('Rotated at: ') + chalk.dim(data.rotatedAt),
);
}
} catch (err) {
console.error(
chalk.red('Error:'),
err instanceof Error ? err.message : String(err),
);
process.exit(1);
} finally {
rl.close();
}
});
}

View File

@@ -0,0 +1,122 @@
import { Command } from 'commander';
import chalk from 'chalk';
import { requireConfig } from '../config';
import { apiRequest } from '../api';
interface AuditEvent {
id: string;
timestamp: string;
action: string;
agentId?: string;
tenantId?: string;
outcome: string;
details?: Record<string, unknown>;
}
interface AuditLogsResponse {
events: AuditEvent[];
nextCursor?: string;
}
function formatEvent(event: AuditEvent): string {
const ts = chalk.dim(new Date(event.timestamp).toLocaleString());
const outcome =
event.outcome === 'success'
? chalk.green(event.outcome)
: chalk.red(event.outcome);
const action = chalk.cyan(event.action);
const agentPart =
event.agentId !== undefined
? ' ' + chalk.dim('agent=' + event.agentId)
: '';
return `${ts} ${action} outcome=${outcome}${agentPart} id=${chalk.dim(event.id)}`;
}
export function registerTailAuditLog(program: Command): void {
program
.command('tail-audit-log')
.description(
'Poll and stream audit log events every 5 seconds (Ctrl+C to stop)',
)
.option('--agent-id <id>', 'Filter events for a specific agent ID')
.action(async (options: { agentId?: string }) => {
const config = requireConfig();
console.log(
chalk.bold('Tailing audit log') +
(options.agentId !== undefined
? chalk.dim(` (agent: ${options.agentId})`)
: '') +
chalk.dim(' — press Ctrl+C to stop'),
);
console.log(chalk.dim('─'.repeat(60)));
const seenIds = new Set<string>();
let cursor: string | undefined;
let running = true;
process.on('SIGINT', () => {
running = false;
console.log();
console.log(chalk.dim('Stopped.'));
process.exit(0);
});
while (running) {
try {
const params: Record<string, string> = {};
if (options.agentId !== undefined) {
params['agentId'] = options.agentId;
}
if (cursor !== undefined) {
params['cursor'] = cursor;
}
// Request events from the last poll window
params['limit'] = '50';
const data = await apiRequest<AuditLogsResponse | AuditEvent[]>(
config,
'/audit/logs',
{ params },
);
const events: AuditEvent[] = Array.isArray(data)
? data
: (data as AuditLogsResponse).events ?? [];
if (!Array.isArray(data) && (data as AuditLogsResponse).nextCursor !== undefined) {
cursor = (data as AuditLogsResponse).nextCursor;
}
for (const event of events) {
if (!seenIds.has(event.id)) {
seenIds.add(event.id);
console.log(formatEvent(event));
}
}
// Keep the seenIds set bounded to avoid unbounded memory growth
if (seenIds.size > 10_000) {
const arr = Array.from(seenIds);
const keep = arr.slice(arr.length - 5_000);
seenIds.clear();
for (const id of keep) seenIds.add(id);
}
} catch (err) {
console.error(
chalk.yellow('⚠') +
' Poll error: ' +
(err instanceof Error ? err.message : String(err)),
);
}
// Wait 5 seconds between polls
await new Promise<void>((resolve) => {
const timer = setTimeout(resolve, 5000);
// Allow the timer to be garbage-collected if process exits
if (typeof timer.unref === 'function') timer.unref();
});
}
});
}

61
cli/src/config.ts Normal file
View File

@@ -0,0 +1,61 @@
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
export interface Config {
apiUrl: string;
clientId: string;
clientSecret: string;
}
const CONFIG_DIR = path.join(os.homedir(), '.sentryagent');
const CONFIG_FILE = path.join(CONFIG_DIR, 'config.json');
export function readConfig(): Config | null {
if (!fs.existsSync(CONFIG_FILE)) {
return null;
}
try {
const raw = fs.readFileSync(CONFIG_FILE, 'utf-8');
const parsed: unknown = JSON.parse(raw);
if (
parsed !== null &&
typeof parsed === 'object' &&
'apiUrl' in parsed &&
'clientId' in parsed &&
'clientSecret' in parsed &&
typeof (parsed as Record<string, unknown>)['apiUrl'] === 'string' &&
typeof (parsed as Record<string, unknown>)['clientId'] === 'string' &&
typeof (parsed as Record<string, unknown>)['clientSecret'] === 'string'
) {
const p = parsed as Record<string, unknown>;
return {
apiUrl: p['apiUrl'] as string,
clientId: p['clientId'] as string,
clientSecret: p['clientSecret'] as string,
};
}
return null;
} catch {
return null;
}
}
export function writeConfig(config: Config): void {
if (!fs.existsSync(CONFIG_DIR)) {
fs.mkdirSync(CONFIG_DIR, { recursive: true, mode: 0o700 });
}
fs.writeFileSync(CONFIG_FILE, JSON.stringify(config, null, 2), {
encoding: 'utf-8',
mode: 0o600,
});
}
export function requireConfig(): Config {
const config = readConfig();
if (config === null) {
console.error('Not configured. Run `sentryagent configure` first.');
process.exit(1);
}
return config;
}

31
cli/src/index.ts Normal file
View File

@@ -0,0 +1,31 @@
#!/usr/bin/env node
import { Command } from 'commander';
import packageJson from '../package.json';
import { registerConfigure } from './commands/configure';
import { registerRegisterAgent } from './commands/register-agent';
import { registerListAgents } from './commands/list-agents';
import { registerIssueToken } from './commands/issue-token';
import { registerRotateCredentials } from './commands/rotate-credentials';
import { registerTailAuditLog } from './commands/tail-audit-log';
import { registerCompletion } from './commands/completion';
const program = new Command();
program
.name('sentryagent')
.description('SentryAgent.ai CLI — manage agents, tokens, and audit logs')
.version(packageJson.version, '-v, --version', 'Output the current version');
// Register all commands
registerConfigure(program);
registerRegisterAgent(program);
registerListAgents(program);
registerIssueToken(program);
registerRotateCredentials(program);
registerTailAuditLog(program);
registerCompletion(program);
// Parse args — commander will display help automatically on --help
program.parse(process.argv);

29
cli/tsconfig.json Normal file
View File

@@ -0,0 +1,29 @@
{
"compilerOptions": {
"target": "ES2020",
"module": "commonjs",
"lib": ["ES2020"],
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"strictFunctionTypes": true,
"strictBindCallApply": true,
"strictPropertyInitialization": true,
"noImplicitThis": true,
"alwaysStrict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}

View File

@@ -9,6 +9,7 @@ import AgentDetail from '@/pages/AgentDetail';
import Credentials from '@/pages/Credentials'; import Credentials from '@/pages/Credentials';
import AuditLog from '@/pages/AuditLog'; import AuditLog from '@/pages/AuditLog';
import Health from '@/pages/Health'; import Health from '@/pages/Health';
import { UsagePanel } from '@/components/UsagePanel';
/** Top-level router — defines all application routes. */ /** Top-level router — defines all application routes. */
export default function App(): React.JSX.Element { export default function App(): React.JSX.Element {
@@ -23,6 +24,7 @@ export default function App(): React.JSX.Element {
<Route path="/dashboard/agents/:agentId/credentials" element={<Credentials />} /> <Route path="/dashboard/agents/:agentId/credentials" element={<Credentials />} />
<Route path="/dashboard/audit" element={<AuditLog />} /> <Route path="/dashboard/audit" element={<AuditLog />} />
<Route path="/dashboard/health" element={<Health />} /> <Route path="/dashboard/health" element={<Health />} />
<Route path="/dashboard/usage" element={<UsagePanel />} />
</Route> </Route>
</Route> </Route>
<Route path="/dashboard" element={<Navigate to="/dashboard/agents" replace />} /> <Route path="/dashboard" element={<Navigate to="/dashboard/agents" replace />} />

View File

@@ -0,0 +1,192 @@
import * as React from 'react';
import { useAuth } from '@/lib/auth';
import { TokenManager } from '@sentryagent/idp-sdk';
/** Shape of the GET /api/v1/billing/usage response. */
interface UsageResponse {
tenantId: string;
date: string;
apiCalls: number;
agentCount: number;
subscriptionStatus: string;
currentPeriodEnd: string | null;
stripeSubscriptionId: string | null;
}
type LoadState = 'idle' | 'loading' | 'success' | 'error';
interface UsageState {
loadState: LoadState;
data: UsageResponse | null;
errorMessage: string | null;
}
const initialState: UsageState = {
loadState: 'idle',
data: null,
errorMessage: null,
};
/**
* Fetches the current usage summary from the API using the stored credentials.
*
* @param baseUrl - The API base URL.
* @param clientId - The agent client ID.
* @param clientSecret - The agent client secret.
* @returns The usage response from the server.
*/
async function fetchUsage(
baseUrl: string,
clientId: string,
clientSecret: string,
): Promise<UsageResponse> {
const tokenManager = new TokenManager(
baseUrl,
clientId,
clientSecret,
'agents:read',
);
const token = await tokenManager.getToken();
const response = await fetch(`${baseUrl}/api/v1/billing/usage`, {
headers: { Authorization: `Bearer ${token}` },
});
if (!response.ok) {
throw new Error(`Failed to fetch usage data (HTTP ${response.status})`);
}
return response.json() as Promise<UsageResponse>;
}
/** Badge shown for the tenant's subscription tier. */
function SubscriptionBadge({ status }: { status: string }): React.JSX.Element {
const isPro = status !== 'free';
return (
<span
className={`inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold ${
isPro
? 'bg-brand-100 text-brand-700'
: 'bg-slate-100 text-slate-600'
}`}
>
{isPro ? 'Pro' : 'Free Tier'}
</span>
);
}
/** A single metric card with label and value. */
function MetricCard({ label, value }: { label: string; value: string | number }): React.JSX.Element {
return (
<div className="rounded-xl border border-slate-200 bg-white p-6 shadow-sm">
<p className="text-sm font-medium text-slate-500">{label}</p>
<p className="mt-1 text-2xl font-bold text-slate-900">{value}</p>
</div>
);
}
/**
* Displays the current tenant's usage summary:
* - API calls today
* - Active agent count
* - Subscription status (Free Tier / Pro)
*
* Fetches GET /api/v1/billing/usage with the current Bearer token.
* Handles loading state and error state gracefully.
*/
export function UsagePanel(): React.JSX.Element {
const { credentials } = useAuth();
const [state, setState] = React.useState<UsageState>(initialState);
const loadUsage = React.useCallback(async (): Promise<void> => {
if (!credentials) return;
setState((prev) => ({ ...prev, loadState: 'loading', errorMessage: null }));
try {
const data = await fetchUsage(
credentials.baseUrl,
credentials.clientId,
credentials.clientSecret,
);
setState({ loadState: 'success', data, errorMessage: null });
} catch (err) {
const message = err instanceof Error ? err.message : 'Unknown error occurred.';
setState({ loadState: 'error', data: null, errorMessage: message });
}
}, [credentials]);
React.useEffect(() => {
void loadUsage();
}, [loadUsage]);
const isLoading = state.loadState === 'loading' || state.loadState === 'idle';
return (
<div>
<div className="mb-6 flex items-center justify-between">
<h1 className="text-2xl font-bold text-slate-900">Usage &amp; Billing</h1>
<button
onClick={() => { void loadUsage(); }}
disabled={isLoading}
className="rounded-md border border-slate-300 px-3 py-1.5 text-sm hover:bg-slate-50 disabled:opacity-40"
>
Refresh
</button>
</div>
{/* Error state */}
{state.loadState === 'error' && (
<div className="mb-6 rounded-md bg-red-50 px-4 py-3 text-sm text-red-700" role="alert">
{state.errorMessage ?? 'Failed to load usage data.'}
</div>
)}
{/* Loading skeleton */}
{isLoading && (
<div className="grid grid-cols-1 gap-4 sm:grid-cols-3 animate-pulse">
{[1, 2, 3].map((i) => (
<div key={i} className="h-28 rounded-xl border border-slate-200 bg-slate-100" />
))}
</div>
)}
{/* Data */}
{state.loadState === 'success' && state.data !== null && (
<>
<div className="mb-4 flex items-center gap-3">
<p className="text-sm text-slate-500">
Showing usage for <strong>{state.data.date}</strong>
</p>
<SubscriptionBadge status={state.data.subscriptionStatus} />
</div>
<div className="grid grid-cols-1 gap-4 sm:grid-cols-3">
<MetricCard label="API Calls Today" value={state.data.apiCalls.toLocaleString()} />
<MetricCard label="Active Agents" value={state.data.agentCount.toLocaleString()} />
<MetricCard label="Plan" value={state.data.subscriptionStatus === 'free' ? 'Free Tier' : 'Pro'} />
</div>
{state.data.subscriptionStatus === 'free' && (
<div className="mt-6 rounded-xl border border-brand-200 bg-brand-50 p-5">
<p className="text-sm font-medium text-brand-800">
You are on the Free Tier limited to 10 agents and 1,000 API calls/day.
</p>
<p className="mt-1 text-sm text-brand-700">
Upgrade to Pro for unlimited agents and API calls.
</p>
</div>
)}
{state.data.currentPeriodEnd !== null && (
<p className="mt-4 text-xs text-slate-400">
Current period ends:{' '}
{new Date(state.data.currentPeriodEnd).toLocaleDateString()}
</p>
)}
</>
)}
</div>
);
}

View File

@@ -12,6 +12,7 @@ const NAV_ITEMS: NavItem[] = [
{ to: '/dashboard/agents', label: 'Agents' }, { to: '/dashboard/agents', label: 'Agents' },
{ to: '/dashboard/audit', label: 'Audit Log' }, { to: '/dashboard/audit', label: 'Audit Log' },
{ to: '/dashboard/health', label: 'Health' }, { to: '/dashboard/health', label: 'Health' },
{ to: '/dashboard/usage', label: 'Usage' },
]; ];
/** /**

View File

@@ -0,0 +1,172 @@
# Audit Log Chain Verification Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC7.2 — Audit Log Integrity
**Service:** `src/services/AuditVerificationService.ts`
**Job:** `src/jobs/AuditChainVerificationJob.ts`
**Endpoint:** `GET /api/v1/audit/verify`
---
## Overview
Every audit event in the `audit_events` PostgreSQL table is linked to the previous one
via a SHA-256 hash chain. Each event stores:
- `hash` — SHA-256 of `(eventId + timestamp.toISOString() + action + outcome + agentId + organizationId + previousHash)`
- `previous_hash` — the `hash` of the immediately preceding event (ordered by `timestamp ASC, event_id ASC`)
The first event in the chain uses `previous_hash = ''` (empty string sentinel).
A PostgreSQL trigger (`trg_audit_events_immutable`) prevents UPDATE and DELETE operations
on `audit_events`, making the log tamper-evident at the database level.
---
## Running GET /audit/verify
### Full chain verification (no date range)
```bash
# Requires Bearer token with audit:read scope
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify"
```
**Response (chain intact):**
```json
{
"verified": true,
"checkedCount": 18504,
"brokenAtEventId": null
}
```
**Response (chain break detected):**
```json
{
"verified": false,
"checkedCount": 1203,
"brokenAtEventId": "c4d5e6f7-a8b9-0123-cdef-456789012345"
}
```
### Date-ranged verification
```bash
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/audit/verify?fromDate=2026-03-01T00:00:00.000Z&toDate=2026-03-31T23:59:59.999Z"
```
### Interpreting the response
| Field | Meaning |
|---|---|
| `verified: true` | All events in the checked range maintain valid hash chain linkage |
| `verified: false` | At least one chain break detected — see `brokenAtEventId` |
| `checkedCount` | Number of events examined (0 = no events in range) |
| `brokenAtEventId` | UUID of the first event where the chain fails (`null` if verified) |
| `fromDate` / `toDate` | Echo of the date range parameters (only present if supplied) |
---
## AuditChainVerificationJob
The `AuditChainVerificationJob` runs automatically in the background every hour (default).
Configure the interval via `AUDIT_CHAIN_VERIFICATION_INTERVAL_MS` (milliseconds).
On each tick it calls `verifyChain()` and:
- Sets Prometheus gauge `agentidp_audit_chain_integrity` to **1** (passing)
- Updates `ComplianceStatusStore` with `CC7.2 = passing`
If verification fails:
- Sets gauge to **0**
- Updates `ComplianceStatusStore` with `CC7.2 = failing`
- Prometheus alert `AuditChainIntegrityFailed` fires immediately (severity: critical)
- Application logs: `[AuditChainVerificationJob] Chain BROKEN at event <uuid>`
---
## What to Do When `brokenAtEventId` is Returned
### Step 1: Preserve Evidence
Immediately capture the full state of the audit log for forensic analysis:
```sql
-- Export all events around the break point
SELECT event_id, timestamp, action, outcome, agent_id, organization_id, hash, previous_hash
FROM audit_events
WHERE timestamp >= (
SELECT timestamp - INTERVAL '1 hour'
FROM audit_events WHERE event_id = '<brokenAtEventId>'
)
ORDER BY timestamp ASC, event_id ASC;
```
Save the output to a secure, immutable location (e.g. S3 with object locking).
### Step 2: Identify the Break Type
Compare the recomputed hash for the broken event with its stored hash:
```bash
# Using Node.js
node -e "
const crypto = require('crypto');
const eventId = '<event_id>';
const timestamp = '<timestamp_from_db>';
const action = '<action>';
const outcome = '<outcome>';
const agentId = '<agent_id>';
const orgId = '<organization_id>';
const prevHash = '<previous_hash_from_db>';
const expected = crypto.createHash('sha256')
.update(eventId + new Date(timestamp).toISOString() + action + outcome + agentId + orgId + prevHash)
.digest('hex');
console.log('Expected hash:', expected);
console.log('Stored hash: <hash_from_db>');
console.log('Match:', expected === '<hash_from_db>');
"
```
Possible break types:
- **Hash mismatch only** — event data was modified after insertion
- **previous_hash mismatch** — an event was inserted/deleted before this event in the chain
- **Both mismatched** — multiple modifications or an injection attack
### Step 3: Escalate
A chain break is a **critical security incident**. Immediately:
1. Notify the security team and CISO
2. Engage incident response procedure (`docs/compliance/incident-response.md` — Audit Chain Integrity Failure section)
3. Do NOT attempt to "fix" the hash — preserve the broken state as evidence
4. Consider temporarily suspending API access pending investigation
5. Notify affected customers per data breach notification obligations
### Step 4: Forensic Investigation
Using PostgreSQL audit logs, Vault audit logs, and application logs:
- Identify which application process or database connection modified the row
- Correlate with access logs and authentication events
- Determine the extent of the compromise (single row vs. systematic)
---
## Verification Rate Limiting
`GET /audit/verify` is rate-limited to **30 requests/minute** per `client_id`.
For continuous monitoring, use `AuditChainVerificationJob` (background job, no rate limit)
and poll `GET /compliance/controls` instead.
---
## SOC 2 Evidence Package
For auditors, provide:
1. `GET /audit/verify` response (full chain, no date filter) — save as JSON
2. Prometheus metric export: `agentidp_audit_chain_integrity` time series (30/60/90 days)
3. PostgreSQL trigger definition: `\d+ audit_events` in psql
4. `src/db/migrations/020_add_audit_chain_columns.sql` — shows immutability trigger DDL
5. `docs/openapi/compliance.yaml` — endpoint specification

View File

@@ -0,0 +1,159 @@
# Encryption Key Rotation Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC6.1 — Encryption at Rest
**Service:** `src/services/EncryptionService.ts`
**Vault path:** Configured via `ENCRYPTION_KEY_VAULT_PATH` env var (default: `secret/data/agentidp/encryption-key`)
---
## Overview
AgentIdP uses AES-256-CBC column-level encryption for sensitive PostgreSQL columns.
The encryption key is a 64-character hex string (32 bytes) stored in HashiCorp Vault.
The `EncryptionService` fetches the key once and caches it in process memory.
Encrypted format: `base64(IV):base64(ciphertext)` where IV is 16 random bytes per encryption call.
---
## Key Rotation Procedure
### Prerequisites
- Access to HashiCorp Vault with write permissions to the encryption key path
- Access to the production application environment (to trigger restart)
- At least one backup of the current key stored securely offline
### Step 1: Generate a New Key
Generate a cryptographically strong 32-byte (64-character hex) key:
```bash
openssl rand -hex 32
# Example output: a1b2c3d4e5f6... (64 hex chars)
```
Record the new key securely.
### Step 2: Backup the Current Key
Before overwriting, read and securely store the current key:
```bash
vault kv get -field=encryptionKey secret/agentidp/encryption-key > /secure/backup/encryption-key-$(date +%Y%m%d).txt
```
Store in a hardware security module (HSM) or offline key store.
### Step 3: Write the New Key to Vault
```bash
vault kv put secret/agentidp/encryption-key encryptionKey="<new-64-char-hex-key>"
```
Verify the write:
```bash
vault kv get secret/agentidp/encryption-key
```
Confirm the `encryptionKey` field contains exactly 64 hex characters.
### Step 4: Restart the Application
The `EncryptionService` caches the key in process memory. A restart forces a re-fetch from Vault:
```bash
# Kubernetes rolling restart
kubectl rollout restart deployment/agentidp
# Docker Compose
docker-compose restart agentidp
# PM2
pm2 restart agentidp
```
### Step 5: Verify Key Pick-Up
Check the application logs for:
```
[AgentIdP] EncryptionService enabled — sensitive columns encrypted at rest (SOC 2 CC6.1)
```
Call the compliance controls endpoint to confirm the control is passing:
```bash
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
```
Expected output:
```json
{ "id": "CC6.1", "name": "Encryption at Rest", "status": "passing", "lastChecked": "..." }
```
### Step 6: Re-encryption of Existing Rows
Existing rows encrypted with the old key will fail to decrypt after key rotation.
Re-encryption happens lazily: the next time each row is read and re-written (e.g. credential rotation,
webhook update), the application will decrypt with the old key and re-encrypt with the new one.
For immediate full re-encryption, use the re-encryption script:
```bash
# Run the re-encryption migration script (reads old key from backup, encrypts with new key)
# Note: This script requires both old and new keys to be available
ts-node scripts/reencrypt-columns.ts --old-key-file /secure/backup/encryption-key-<date>.txt
```
---
## Emergency Rollback
If the new key causes issues (e.g. test failures, decryption errors), roll back:
### Step 1: Restore Old Key to Vault
```bash
vault kv put secret/agentidp/encryption-key encryptionKey="<old-64-char-hex-key-from-backup>"
```
### Step 2: Restart the Application
```bash
kubectl rollout restart deployment/agentidp
```
### Step 3: Verify Recovery
```bash
curl -s https://api.sentryagent.ai/v1/compliance/controls | jq '.controls[] | select(.id == "CC6.1")'
```
### Step 4: Investigate Root Cause
Review application logs for `AES-256-CBC decryption failed` errors and audit the cause before
reattempting rotation.
---
## Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| `Invalid encryption key ... expected a 64-character hex string` | Key in Vault is wrong length or encoding | Re-write correct key to Vault, restart |
| `AES-256-CBC decryption failed — possible key mismatch` | Key rotated but rows still encrypted with old key | Rollback to old key, then migrate properly |
| `CC6.1` status shows `unknown` | Vault unreachable, key fetch failed | Check Vault connectivity, `VAULT_ADDR`, `VAULT_TOKEN` |
---
## Audit Evidence
After rotation, record the following for SOC 2 evidence:
- Date of rotation
- Who performed the rotation (approver + executor)
- Vault audit log entry confirming the key write
- Application log confirming EncryptionService initialised with new key
- `GET /compliance/controls` response showing CC6.1 = passing

View File

@@ -0,0 +1,229 @@
# Incident Response Runbook — SentryAgent.ai AgentIdP
**Owner:** Security Engineering
**Last updated:** 2026-03-31
**Applies to:** Production AgentIdP deployments
This runbook covers the four incident types most relevant to SOC 2 Type II compliance monitoring.
---
## 1. Auth Failure Spike
### Detection
**Prometheus alert:** `AuthFailureSpike`
```yaml
expr: rate(agentidp_http_requests_total{status_code="401"}[5m]) > 0.5
for: 2m
severity: warning
```
Triggers when the rate of HTTP 401 responses exceeds 0.5 per second sustained over 2 minutes.
### Immediate Actions
1. Acknowledge the alert in PagerDuty / alerting system
2. Check whether the spike correlates with a scheduled process (e.g. batch agent key rotation, deployment)
3. Check Prometheus dashboard for the geographic distribution of the failing requests
### Investigation Steps
1. **Identify source agents:**
```bash
# Query audit log for recent auth failures
curl -s -H "Authorization: Bearer <admin-token>" \
"https://api.sentryagent.ai/v1/audit?action=auth.failed&limit=100"
```
2. **Check for brute-force patterns:**
Look for repeated failures from the same `client_id` or IP address.
3. **Check if an agent's credentials expired:**
```bash
# Look for expired credentials
psql "$DATABASE_URL" -c "
SELECT credential_id, client_id, expires_at
FROM credentials
WHERE status = 'active' AND expires_at < NOW()
ORDER BY expires_at DESC LIMIT 20;"
```
4. **Check for key compromise signals:**
- Multiple agents failing simultaneously → possible key store issue
- Single agent with high failure rate → possible credential stuffing or misconfiguration
### Escalation Path
- **Warning (< 2 req/s):** Engineering on-call investigates within 1 hour
- **Critical (> 2 req/s sustained):** CISO notified, potential account compromise investigation
- **If credential compromise confirmed:** Revoke affected credentials immediately via `POST /agents/:id/credentials/:credId/revoke`
---
## 2. Anomalous Token Issuance
### Detection
**Prometheus alert:** `AnomalousTokenIssuance`
```yaml
expr: rate(agentidp_tokens_issued_total[5m]) > 10
for: 5m
severity: warning
```
Triggers when token issuance rate exceeds 10 per second for 5 continuous minutes.
### Immediate Actions
1. Acknowledge the alert
2. Determine if a legitimate mass-scale operation is underway (e.g. new customer onboarding, load test)
3. Check the `scope` label breakdown on `agentidp_tokens_issued_total` to identify what scopes are being requested
### Investigation Steps
1. **Identify top issuing agents:**
```bash
# Query audit log for recent token issuances
curl -s -H "Authorization: Bearer <admin-token>" \
"https://api.sentryagent.ai/v1/audit?action=token.issued&limit=100"
```
2. **Check monthly token budget:**
Each agent is limited to 10,000 tokens/month (free tier). A single agent hitting the limit may indicate automation abuse.
3. **Check for abnormal scope combinations:**
If tokens are being issued with `admin:orgs` or `audit:read` at high volume, this warrants immediate investigation.
4. **Check for valid business reason:**
Contact the organization owner for the top-issuing agents.
### Escalation Path
- **Warning:** Engineering on-call investigates within 4 hours
- **If compromise suspected:** Revoke affected agent tokens via Redis revocation list, rotate credentials
- **If systematic abuse confirmed:** Suspend the issuing agent(s) via `PATCH /agents/:id` with `status: suspended`
---
## 3. Audit Chain Integrity Failure
### Detection
**Prometheus alert:** `AuditChainIntegrityFailed`
```yaml
expr: agentidp_audit_chain_integrity == 0
for: 0m
severity: critical
```
Fires immediately when `AuditChainVerificationJob` detects a break in the audit event hash chain.
This is a **CRITICAL** security event — possible evidence of log tampering.
### Immediate Actions
1. **Do NOT attempt to repair the broken chain** — preserve all evidence
2. Notify CISO and security team immediately
3. Page the on-call security engineer with P0 priority
4. Capture the current state:
```bash
curl -s -H "Authorization: Bearer <audit-token>" \
"https://api.sentryagent.ai/v1/audit/verify" | tee /secure/incident-$(date +%Y%m%d-%H%M).json
```
### Investigation Steps
1. **Determine the broken event:**
The `brokenAtEventId` field in the `/audit/verify` response identifies the first broken event.
2. **Forensic analysis:**
Follow the steps in `docs/compliance/audit-log-runbook.md` — "What to Do When brokenAtEventId is Returned".
3. **Check database access logs:**
Review PostgreSQL `pg_stat_activity` and connection logs for unauthorized direct DB access.
4. **Check application logs:**
Look for any errors from the immutability trigger (`audit_events_immutable`).
5. **Check Vault audit logs:**
Review whether any encryption key access was abnormal.
### Escalation Path
- **Immediate:** CISO + Legal + Security Engineering
- **Within 1 hour:** Begin forensic preservation per incident response plan
- **Within 24 hours:** Determine scope of compromise and notification obligations
- **Customer notification:** Per contractual and regulatory obligations (GDPR, SOC 2 requirements)
---
## 4. Webhook Dead-Letter Accumulation
### Detection
**Prometheus alert:** `WebhookDeadLetterAccumulating`
```yaml
expr: increase(agentidp_webhook_dead_letters_total[1h]) > 10
for: 0m
severity: critical
```
Fires when more than 10 webhook deliveries reach dead-letter status within an hour.
### Immediate Actions
1. Acknowledge the alert
2. Check which `organization_id` labels are accumulating dead-letters:
```bash
# Prometheus query: top organizations by dead-letter rate
# agentidp_webhook_dead_letters_total (by organization_id)
```
3. Check if the destination endpoints are reachable:
```bash
curl -I https://<webhook-destination-url>/
```
### Investigation Steps
1. **List affected webhook subscriptions:**
```bash
# Query delivery records for dead-letter status
psql "$DATABASE_URL" -c "
SELECT s.id, s.organization_id, s.url, COUNT(d.id) AS dead_letters
FROM webhook_subscriptions s
JOIN webhook_deliveries d ON d.subscription_id = s.id
WHERE d.status = 'dead_letter'
AND d.updated_at > NOW() - INTERVAL '2 hours'
GROUP BY s.id
ORDER BY dead_letters DESC
LIMIT 20;"
```
2. **Check delivery failure reasons:**
```bash
psql "$DATABASE_URL" -c "
SELECT http_status_code, COUNT(*) as count
FROM webhook_deliveries
WHERE status = 'dead_letter'
AND updated_at > NOW() - INTERVAL '2 hours'
GROUP BY http_status_code;"
```
3. **Common causes and resolutions:**
| HTTP Status | Likely Cause | Resolution |
|---|---|---|
| 0 / null | Network unreachable / DNS failure | Check recipient endpoint availability |
| 401 / 403 | HMAC signature validation failing | Customer to verify HMAC secret |
| 404 | Endpoint URL changed | Customer to update webhook URL |
| 5xx | Recipient server error | Customer to investigate their endpoint |
| Timeout | Slow recipient endpoint | Customer to optimize endpoint response time |
4. **Notify affected customers:**
Contact the organization owner for high-volume dead-letter subscriptions.
### Escalation Path
- **Warning (10-50/hr):** Engineering notifies affected customers, investigates endpoint health
- **Critical (> 50/hr):** Engineering on-call + Platform reliability team engaged
- **If systemic delivery infrastructure failure:** Activate incident bridge, escalate to VP Engineering

View File

@@ -0,0 +1,142 @@
# Secrets Rotation Runbook — SentryAgent.ai AgentIdP
**Control:** SOC 2 CC9.2 — Secrets Rotation
**Last updated:** 2026-03-31
---
## Overview
AgentIdP manages three categories of secrets that require periodic rotation:
1. **Agent client secrets** — Per-credential client secrets used for OAuth 2.0 token issuance
2. **OIDC signing keys** — RSA/EC keys used to sign ID tokens
3. **AES-256-CBC encryption key** — Column-level database encryption key (see `encryption-runbook.md`)
---
## 1. Agent Credential (Client Secret) Rotation
### API endpoint
```
POST /api/v1/agents/:agentId/credentials/:credentialId/rotate
```
Requires Bearer token with `agents:write` scope.
### Procedure
```bash
# 1. List active credentials for the agent
curl -s -H "Authorization: Bearer <token>" \
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials?status=active"
# 2. Rotate the credential (generate new secret)
curl -s -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"expiresAt": "2027-03-31T00:00:00.000Z"}' \
"https://api.sentryagent.ai/v1/agents/<agentId>/credentials/<credentialId>/rotate"
# Response includes the new clientSecret — store it immediately; it is never shown again
```
### Key points
- The new `clientSecret` is returned **once only** — store it securely before the response is discarded
- The agent's previous secret is immediately invalidated (Vault KV v2 version overwritten)
- An audit event `credential.rotated` is logged to the immutable audit chain
- A `credential.rotated` webhook event is dispatched to all active subscriptions
### Recommended rotation schedule
| Credential type | Recommended rotation interval |
|---|---|
| Production agent credentials | 90 days |
| Staging / development credentials | 180 days |
| Service account credentials | 365 days (annual) |
| Credentials involved in a security incident | Immediately |
### Automated expiry detection
`SecretsRotationJob` runs hourly and queries credentials expiring within 7 days.
Prometheus alert `CredentialExpiryApproaching` fires immediately when any are detected.
Respond to this alert by rotating the flagged credential(s) before the expiry date.
---
## 2. OIDC Signing Key Rotation
### Overview
OIDC signing keys are managed by `OIDCKeyService` (`src/services/OIDCKeyService.ts`).
Keys are stored in the `oidc_keys` PostgreSQL table. The current active key is used to
sign all new ID tokens; public keys are exposed via `GET /.well-known/jwks.json`.
### When to rotate
- Key compromise or suspected exposure
- Scheduled rotation (recommended every 90 days for production)
- Algorithm upgrade (e.g. RS256 → ES256)
### Rotation procedure
OIDC key rotation is handled automatically by `OIDCKeyService.ensureCurrentKey()`:
```bash
# Force generation of a new signing key by calling the internal rotate endpoint
# (or trigger by redeploying with OIDC_FORCE_KEY_ROTATION=true)
# 1. Mark current key as inactive (if manual rotation is required)
psql "$DATABASE_URL" -c "
UPDATE oidc_keys
SET active = false
WHERE active = true;"
# 2. Restart the application — ensureCurrentKey() will generate a new key on startup
kubectl rollout restart deployment/agentidp
```
### JWKS update behavior
- Old public keys remain in `GET /.well-known/jwks.json` for **24 hours** after rotation
(grace period for in-flight tokens)
- After the grace period, old keys are removed from the JWKS endpoint
- Redis JWKS cache TTL is configured by `JWKS_CACHE_TTL_SECONDS` (default: 3600)
### Impact on existing tokens
Existing valid tokens signed with the old key **continue to work** until they expire,
as long as the old public key remains in JWKS. After the grace period, old tokens
will fail verification.
---
## 3. Encryption Key Rotation
See `docs/compliance/encryption-runbook.md` for the full AES-256-CBC encryption key rotation procedure.
**Summary:** Generate new 32-byte hex key → write to Vault at `ENCRYPTION_KEY_VAULT_PATH` → restart app → existing rows re-encrypted lazily on next read-write cycle.
---
## Schedule Recommendations
| Secret Type | Production Interval | Staging Interval | Trigger for Immediate Rotation |
|---|---|---|---|
| Agent client secrets | 90 days | 180 days | Credential suspected compromised |
| OIDC signing keys | 90 days | 180 days | Key file exposed, algorithm upgrade |
| AES-256-CBC encryption key | 365 days (annual) | On demand | Key exposed, Vault breach, compliance audit requirement |
| Webhook HMAC secrets | Per customer policy | N/A | Webhook endpoint compromised |
---
## Compliance Evidence
For SOC 2 CC9.2 evidence collection:
- Prometheus metric history: `agentidp_credentials_expiring_soon_total`
- Audit log entries with `action: credential.rotated` — query via `GET /audit?action=credential.rotated`
- Key rotation records from Vault audit log
- This runbook + sign-off from Security Engineering

View File

@@ -0,0 +1,42 @@
# SOC 2 Type II Controls Matrix — SentryAgent.ai AgentIdP
This document maps the five in-scope SOC 2 Trust Services Criteria (TSC) controls to their
corresponding implementation artefacts, mechanisms, and automated verification methods.
---
## Controls Matrix
| Control ID | TSC Criterion Name | Implementation File | Mechanism | Automated Check |
|---|---|---|---|---|
| **CC6.1** | Encryption at Rest | `src/services/EncryptionService.ts` | AES-256-CBC column-level encryption on `credentials.secret_hash`, `credentials.vault_path`, `webhook_subscriptions.vault_secret_path`, `agent_did_keys.vault_key_path`. Key is stored in HashiCorp Vault KV v2 at path configured by `ENCRYPTION_KEY_VAULT_PATH`. IV is randomised per encryption call. Backward-compat: `isEncrypted()` gate allows plaintext rows to coexist during migration. | `GET /api/v1/compliance/controls` returns `CC6.1` status. Status is set to `passing` on service startup when `EncryptionService` initialises. |
| **CC6.7** | TLS Enforcement | `src/middleware/TLSEnforcementMiddleware.ts` | Express middleware registered as the **first** middleware in the app stack (before all routes and body parsers). In `NODE_ENV=production`, checks `X-Forwarded-Proto` header set by the upstream load balancer/reverse proxy. Any non-HTTPS request receives a `301 Moved Permanently` redirect to `https://`. | `GET /api/v1/compliance/controls` returns `CC6.7` status. TLS enforcement is a static configuration control; status is set to `passing` on application startup. |
| **CC7.2** | Audit Log Integrity | `src/services/AuditVerificationService.ts`, `src/repositories/AuditRepository.ts`, `src/jobs/AuditChainVerificationJob.ts` | Each audit event (`audit_events` table) stores a `hash` (SHA-256 of `eventId + timestamp + action + outcome + agentId + organizationId + previousHash`) and `previous_hash` linking it to the prior event. An immutability trigger prevents UPDATE/DELETE on `audit_events`. `AuditChainVerificationJob` re-walks the entire chain every hour. | Prometheus gauge `agentidp_audit_chain_integrity` (1 = passing, 0 = failing). Prometheus alert `AuditChainIntegrityFailed` fires when gauge = 0. `GET /api/v1/audit/verify` triggers an on-demand verification. `GET /api/v1/compliance/controls` returns `CC7.2` status. |
| **CC9.2** | Secrets Rotation | `src/jobs/SecretsRotationJob.ts` | `SecretsRotationJob` runs every hour (configurable via `SECRETS_ROTATION_CHECK_INTERVAL_MS`) and queries `credentials` for `active` credentials expiring within 7 days. For each, it increments the `agentidp_credentials_expiring_soon_total` Prometheus counter with the owning `agent_id`. Operators are expected to act on the alert within the 7-day window. | Prometheus counter `agentidp_credentials_expiring_soon_total` per `agent_id`. Prometheus alert `CredentialExpiryApproaching` fires when any increase is detected. `GET /api/v1/compliance/controls` returns `CC9.2` status. |
| **CC7.1** | Webhook Dead-Letter Monitoring | `src/workers/WebhookDeliveryWorker.ts` | `WebhookDeliveryWorker` processes webhook deliveries from a Redis queue. After exhausting all retry attempts (configurable `WEBHOOK_MAX_RETRIES`), the delivery is moved to dead-letter status and `agentidp_webhook_dead_letters_total` is incremented. | Prometheus counter `agentidp_webhook_dead_letters_total` per `organization_id`. Prometheus alert `WebhookDeadLetterAccumulating` fires when > 10 dead-letters accumulate in 1 hour. `GET /api/v1/compliance/controls` returns `CC7.1` status. |
---
## Evidence Collection
For a SOC 2 Type II audit, the following evidence should be collected:
| Evidence Type | Collection Method |
|---|---|
| Encryption at rest configuration | Export Vault KV v2 policy + `_encryption_migration_log` table contents |
| TLS certificate and enforcement logs | Load balancer access logs + `X-Forwarded-Proto` middleware responses |
| Audit chain integrity report | `GET /api/v1/audit/verify` with full date range |
| Secrets rotation compliance | Prometheus metric history for `agentidp_credentials_expiring_soon_total` |
| Webhook dead-letter rate | Prometheus metric history for `agentidp_webhook_dead_letters_total` |
| Immutable audit log dump | Direct PostgreSQL export of `audit_events` table with hash verification |
---
## References
- SOC 2 Trust Services Criteria: [AICPA TSC 2017](https://www.aicpa.org/resources/article/trust-services-criteria)
- OpenAPI spec: `docs/openapi/compliance.yaml`
- Encryption runbook: `docs/compliance/encryption-runbook.md`
- Audit log runbook: `docs/compliance/audit-log-runbook.md`
- Incident response: `docs/compliance/incident-response.md`
- Secrets rotation: `docs/compliance/secrets-rotation.md`

View File

@@ -0,0 +1,548 @@
openapi: 3.0.3
info:
title: SentryAgent.ai — Compliance & SOC 2 Type II Service
version: 1.0.0
description: |
The Compliance Service exposes endpoints supporting SentryAgent.ai's
**SOC 2 Type II** audit readiness programme.
Two categories of control are surfaced:
**Audit chain verification** (`GET /audit/verify`) — Confirms cryptographic
integrity of the immutable audit log chain across an optional date range.
This endpoint provides auditors and compliance tooling with a single call to
assert that no audit events have been tampered with, deleted, or reordered
after initial capture.
**SOC 2 control status** (`GET /compliance/controls`) — Returns a live status
snapshot for each of the five in-scope SOC 2 Trust Services Criteria controls
monitored by the platform. Designed as a lightweight, public health-style
endpoint so that monitoring infrastructure can poll without bearer credentials.
**In-scope SOC 2 controls:**
| Control ID | Name | Description |
|------------|------|-------------|
| `CC6.1` | Encryption at Rest | Verifies database and secrets store encryption is active |
| `CC6.7` | TLS Enforcement | Confirms TLS 1.2+ is enforced on all inbound connections |
| `CC7.2` | Audit Log Integrity | Validates audit chain hash continuity |
| `CC9.2` | Secrets Rotation | Checks that all managed secrets are within rotation policy |
| `CC7.1` | Webhook Dead-Letter Monitoring | Asserts dead-letter queue depth is within threshold |
**Required scope (audit chain verify only):** `audit:read`
servers:
- url: http://localhost:3000/api/v1
description: Local development server
- url: https://api.sentryagent.ai/v1
description: Production server
tags:
- name: Audit Chain
description: Cryptographic integrity verification of the immutable audit event chain
- name: Compliance Controls
description: SOC 2 Type II control status — public health-style monitoring endpoint
components:
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
description: |
JWT access token with `audit:read` scope, obtained via `POST /token`.
Include as: `Authorization: Bearer <token>`
schemas:
ChainVerificationResult:
type: object
description: |
Result of an audit event chain integrity verification run.
The audit log is structured as a hash-linked chain. Each event stores a
reference to the hash of the preceding event. `verified: true` means every
event in the requested window was checked and no breaks in the chain were
detected.
When `verified` is `false`, `brokenAtEventId` identifies the first event
where the chain integrity check failed, enabling targeted forensic investigation.
required:
- verified
- checkedCount
- brokenAtEventId
properties:
verified:
type: boolean
description: >
`true` if every audit event in the checked range maintains an unbroken
cryptographic hash chain; `false` if at least one chain break was detected.
example: true
checkedCount:
type: integer
description: Total number of audit events examined during this verification run.
minimum: 0
example: 2847
brokenAtEventId:
type: string
format: uuid
nullable: true
description: >
UUID of the first audit event where chain continuity failed, or `null`
when `verified` is `true`. Only the first detected break is reported;
subsequent events are not checked after a break is found.
example: null
fromDate:
type: string
format: date-time
description: >
The ISO 8601 lower bound of the date range that was verified.
Present only when a `fromDate` query parameter was supplied.
example: "2026-03-01T00:00:00.000Z"
toDate:
type: string
format: date-time
description: >
The ISO 8601 upper bound of the date range that was verified.
Present only when a `toDate` query parameter was supplied.
example: "2026-03-31T23:59:59.999Z"
ControlStatus:
type: string
description: Operational status of a SOC 2 control at the time of the last check.
enum:
- passing
- failing
- unknown
example: passing
ComplianceControl:
type: object
description: Status record for a single SOC 2 Trust Services Criteria control.
required:
- id
- name
- status
- lastChecked
properties:
id:
type: string
description: SOC 2 Trust Services Criteria control identifier.
enum:
- CC6.1
- CC6.7
- CC7.2
- CC9.2
- CC7.1
example: "CC6.1"
name:
type: string
description: Human-readable name of the control.
example: "Encryption at Rest"
status:
$ref: '#/components/schemas/ControlStatus'
lastChecked:
type: string
format: date-time
description: ISO 8601 timestamp of the most recent automated check for this control.
example: "2026-03-31T06:00:00.000Z"
ComplianceControlsResponse:
type: object
description: SOC 2 compliance control status summary for all in-scope controls.
required:
- controls
properties:
controls:
type: array
description: Status record for each of the five in-scope SOC 2 controls.
minItems: 5
maxItems: 5
items:
$ref: '#/components/schemas/ComplianceControl'
example:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
ErrorResponse:
type: object
description: Standard error response envelope used across all SentryAgent.ai APIs.
required:
- code
- message
properties:
code:
type: string
description: Machine-readable error code.
example: "UNAUTHORIZED"
message:
type: string
description: Human-readable description of the error.
example: "A valid Bearer token is required."
details:
type: object
description: Optional structured details providing additional context.
additionalProperties: true
example: {}
responses:
Unauthorized:
description: Missing or invalid Bearer token.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "A valid Bearer token is required to access this resource."
Forbidden:
description: Valid token but insufficient permissions. Requires `audit:read` scope.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "The 'audit:read' scope is required to verify the audit chain."
TooManyRequests:
description: |
Rate limit exceeded. Retry after the reset time indicated in `X-RateLimit-Reset`.
headers:
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute.
example: 30
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current window.
example: 0
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMIT_EXCEEDED"
message: "Too many requests. Please retry after the rate limit window resets."
InternalServerError:
description: Unexpected server error.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INTERNAL_SERVER_ERROR"
message: "An unexpected error occurred. Please try again later."
paths:
/audit/verify:
get:
operationId: verifyAuditChain
tags:
- Audit Chain
summary: Verify audit log chain integrity
description: |
Triggers a full integrity verification pass over the immutable audit event
chain. Each event in the log contains a cryptographic hash of the previous
event; this endpoint traverses the chain and confirms no breaks exist.
**Use cases:**
- Auditor evidence collection for SOC 2 Type II assessment
- Continuous compliance monitoring (cron-driven)
- Incident response — confirm audit log has not been tampered with
**Requires:** Bearer token with `audit:read` scope.
**Rate limit:** 30 requests/minute per `client_id`. Audit chain verification
is a computationally intensive operation and is rate-limited more aggressively
than standard read endpoints. For continuous monitoring, poll no more than
once per minute.
**Date range filtering:** Supply `fromDate` and/or `toDate` to restrict
verification to a specific window. When omitted, the entire retained audit
log is verified. `fromDate` must be before or equal to `toDate` when both
are provided.
**Result interpretation:**
- `verified: true` — chain is intact across all checked events
- `verified: false` — at least one chain break detected; `brokenAtEventId`
identifies the first affected event
security:
- BearerAuth: []
parameters:
- name: fromDate
in: query
description: |
ISO 8601 date-time lower bound for the verification window (inclusive).
When omitted, verification starts from the earliest available audit event.
Must be before or equal to `toDate` when both are supplied.
required: false
schema:
type: string
format: date-time
example: "2026-03-01T00:00:00.000Z"
- name: toDate
in: query
description: |
ISO 8601 date-time upper bound for the verification window (inclusive).
When omitted, verification runs up to and including the most recent
audit event. Must be after or equal to `fromDate` when both are supplied.
required: false
schema:
type: string
format: date-time
example: "2026-03-31T23:59:59.999Z"
responses:
'200':
description: |
Audit chain verification completed. Inspect `verified` to determine
whether chain integrity is intact. A `200` is returned regardless of
whether verification passed or failed — check the response body.
headers:
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute for this endpoint.
example: 30
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current rate limit window.
example: 29
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ChainVerificationResult'
examples:
chainIntact:
summary: Verification passed — chain is intact
value:
verified: true
checkedCount: 2847
brokenAtEventId: null
fromDate: "2026-03-01T00:00:00.000Z"
toDate: "2026-03-31T23:59:59.999Z"
chainBroken:
summary: Verification failed — chain break detected
value:
verified: false
checkedCount: 1203
brokenAtEventId: "c4d5e6f7-a8b9-0123-cdef-456789012345"
fromDate: "2026-03-01T00:00:00.000Z"
toDate: "2026-03-31T23:59:59.999Z"
noDateRange:
summary: Full log verified (no date range supplied)
value:
verified: true
checkedCount: 18504
brokenAtEventId: null
'400':
description: Invalid query parameter value or date range.
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalidFromDate:
summary: fromDate is not a valid ISO 8601 date-time
value:
code: "VALIDATION_ERROR"
message: "Invalid query parameter value."
details:
field: "fromDate"
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-01T00:00:00.000Z)."
invalidToDate:
summary: toDate is not a valid ISO 8601 date-time
value:
code: "VALIDATION_ERROR"
message: "Invalid query parameter value."
details:
field: "toDate"
reason: "Must be a valid ISO 8601 date-time string (e.g. 2026-03-31T23:59:59.999Z)."
invalidDateRange:
summary: fromDate is after toDate
value:
code: "VALIDATION_ERROR"
message: "Invalid date range."
details:
reason: "fromDate must be before or equal to toDate."
'401':
$ref: '#/components/responses/Unauthorized'
'403':
$ref: '#/components/responses/Forbidden'
'429':
$ref: '#/components/responses/TooManyRequests'
'500':
$ref: '#/components/responses/InternalServerError'
/compliance/controls:
get:
operationId: getComplianceControls
tags:
- Compliance Controls
summary: Get SOC 2 control status summary
description: |
Returns a live status snapshot for each of the five in-scope SOC 2 Type II
Trust Services Criteria controls monitored by the SentryAgent.ai platform.
**No authentication required.** This endpoint is intentionally public
(analogous to a health check) so that external monitoring infrastructure,
status pages, and audit tooling can poll it without bearer credentials.
**Controls monitored:**
| Control ID | Name | What is checked |
|------------|------|-----------------|
| `CC6.1` | Encryption at Rest | Database and secrets store encryption is active and configured |
| `CC6.7` | TLS Enforcement | TLS 1.2+ is enforced on all platform inbound connections |
| `CC7.2` | Audit Log Integrity | Audit chain hash continuity — shorthand of `/audit/verify` |
| `CC9.2` | Secrets Rotation | All managed secrets are within the rotation policy window |
| `CC7.1` | Webhook Dead-Letter Monitoring | Dead-letter queue depth is within the acceptable threshold |
**Status values:**
- `passing` — control is operating within policy
- `failing` — control has breached policy; immediate attention required
- `unknown` — automated check could not complete (e.g. dependency unavailable)
**Caching note:** Responses may be cached for up to 60 seconds by
intermediate proxies. The `lastChecked` field on each control indicates
the timestamp of the most recent automated evaluation.
**Rate limit:** 120 requests/minute per IP address.
security: []
responses:
'200':
description: SOC 2 control status summary returned successfully.
headers:
Cache-Control:
schema:
type: string
description: >
Downstream caches may serve this response for up to 60 seconds.
example: "public, max-age=60"
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests allowed per minute for this endpoint.
example: 120
X-RateLimit-Remaining:
schema:
type: integer
description: Requests remaining in the current rate limit window.
example: 119
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when the rate limit window resets.
example: 1743155400
content:
application/json:
schema:
$ref: '#/components/schemas/ComplianceControlsResponse'
examples:
allPassing:
summary: All controls passing
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
oneControlFailing:
summary: One control failing (secrets rotation overdue)
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "failing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
unknownControl:
summary: One control in unknown state (dependency unavailable)
value:
controls:
- id: "CC6.1"
name: "Encryption at Rest"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC6.7"
name: "TLS Enforcement"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.2"
name: "Audit Log Integrity"
status: "unknown"
lastChecked: "2026-03-31T05:00:00.000Z"
- id: "CC9.2"
name: "Secrets Rotation"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
- id: "CC7.1"
name: "Webhook Dead-Letter Monitoring"
status: "passing"
lastChecked: "2026-03-31T06:00:00.000Z"
'429':
$ref: '#/components/responses/TooManyRequests'
'500':
$ref: '#/components/responses/InternalServerError'

View File

@@ -0,0 +1,50 @@
groups:
- name: agentidp_alerts
rules:
- alert: AuthFailureSpike
expr: rate(agentidp_http_requests_total{status_code="401"}[5m]) > 0.5
for: 2m
labels: { severity: warning }
annotations:
summary: "Auth failure spike detected"
description: "More than 0.5 auth failures/sec over the past 2 minutes."
- alert: RateLimitExhaustion
expr: rate(agentidp_http_requests_total{status_code="429"}[5m]) > 0.2
for: 2m
labels: { severity: warning }
annotations:
summary: "Rate limit exhaustion spike"
description: "Sustained rate limit rejections over the past 2 minutes."
- alert: AnomalousTokenIssuance
expr: rate(agentidp_tokens_issued_total[5m]) > 10
for: 5m
labels: { severity: warning }
annotations:
summary: "Anomalous token issuance rate"
description: "More than 10 tokens/sec issued over the past 5 minutes."
- alert: WebhookDeadLetterAccumulating
expr: increase(agentidp_webhook_dead_letters_total[1h]) > 10
for: 0m
labels: { severity: critical }
annotations:
summary: "Webhook dead-letter accumulation"
description: "More than 10 webhook deliveries moved to dead-letter in the past hour."
- alert: AuditChainIntegrityFailed
expr: agentidp_audit_chain_integrity == 0
for: 0m
labels: { severity: critical }
annotations:
summary: "Audit chain integrity failure"
description: "Audit chain verification failed — possible log tampering detected."
- alert: CredentialExpiryApproaching
expr: increase(agentidp_credentials_expiring_soon_total[1h]) > 0
for: 0m
labels: { severity: info }
annotations:
summary: "Credentials expiring soon"
description: "One or more agent credentials will expire within 7 days."

View File

@@ -2,6 +2,9 @@ global:
scrape_interval: 15s scrape_interval: 15s
evaluation_interval: 15s evaluation_interval: 15s
rule_files:
- alerts.yml
scrape_configs: scrape_configs:
- job_name: 'agentidp' - job_name: 'agentidp'
static_configs: static_configs:

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-29

View File

@@ -0,0 +1,105 @@
## Context
SentryAgent.ai has completed Phase 1 (MVP) and Phase 2 (Production-Ready), producing a fully implemented AgentIdP with 12 capabilities across ~150 source files, 4 language SDKs, Terraform infrastructure, and a React web dashboard. The codebase is mature but undocumented at the engineering level — there are bedroom developer guides (`docs/developers/`) and DevOps guides (`docs/devops/`), but no structured internal engineering knowledge base.
New hires arrive with BSc Computer Science and one year of industrial experience. They understand programming fundamentals and have worked on codebases before, but they have no context on: what SentryAgent.ai is building, why architectural decisions were made, how the codebase is structured, how to navigate the services, how to contribute per our standards, or how the OpenSpec workflow operates. Without documentation, onboarding is fragmented and relies entirely on the CTO's time.
The goal is a `docs/engineering/` directory that a new engineer can read sequentially from top to bottom and arrive ready to contribute within their first week.
## Goals / Non-Goals
**Goals:**
- Produce a complete top-down engineering knowledge base readable in sequence
- Cover all 10 capability areas identified in the proposal
- Calibrate depth for BSc + 1yr experience — assume programming competence, explain domain and architectural decisions
- Every document is self-contained with internal cross-links where needed
- All code examples are complete and runnable (no ellipses, no `// ... rest of code`)
- Development environment setup is achievable in under 30 minutes following the guide alone
- Annotated walkthroughs trace the three critical flows through every layer of code with file:line references
**Non-Goals:**
- Not a replacement for `docs/developers/` (end-user API reference) or `docs/devops/` (operator runbooks)
- Not a tutorial for learning TypeScript, React, or Terraform — assumes language competence
- Not a complete API reference — `docs/developers/api-reference.md` already covers that
- Not roadmap documentation — focuses on what is built, not what is planned
## Decisions
### D1: Location — `docs/engineering/` as a flat directory with an index
**Decision**: All engineering docs live in `docs/engineering/` as flat markdown files with a `README.md` index.
**Rationale**: Deep nested directory structures create navigation friction. Flat layout with numbered filenames (`01-overview.md`, `02-architecture.md`) ensures reading order is obvious without needing a build tool. Gitea renders markdown natively, so no documentation site tooling is required.
**Alternatives considered**:
- `docs/engineering/<subdirs>/` — rejected: adds navigation complexity with no benefit at our current document count
- Docusaurus site — rejected: adds build infrastructure overhead; plain markdown in-repo is sufficient and always in sync with code
---
### D2: Numbered file naming for enforced reading order
**Decision**: Files are named `01-overview.md` through `10-sdk-guide.md`.
**Rationale**: New engineers need a guided path, not a reference library. Numbers make the intended reading sequence unambiguous without any tooling. The `README.md` index maps numbers to sections.
---
### D3: Annotated walkthroughs use file:line references
**Decision**: Code walkthrough documents reference actual source files with line numbers (e.g., `src/controllers/agentController.ts:45`).
**Rationale**: Engineers with 1yr experience learn fastest by reading real code, not simplified pseudocode. File:line references let them jump directly to the relevant section in their editor or on Gitea.
**Trade-off**: Line numbers drift as code changes. Mitigation: walkthrough documents include a "last verified" version comment and note which commit they were verified against. The CTO adds walkthrough review to the Phase 3 change process as a maintenance item.
---
### D4: Three walkthroughs selected by criticality and complexity
**Decision**: Walkthroughs cover: (1) OAuth 2.0 token issuance, (2) agent registration, (3) credential rotation.
**Rationale**:
- Token issuance is the highest-traffic path and touches the most layers (controller → service → repository → Redis → JWT signing)
- Agent registration is the entry point for all users and demonstrates the full validation + persistence + audit pattern
- Credential rotation demonstrates the Vault integration path and shows how Phase 2 extended Phase 1 patterns
These three flows collectively exercise every architectural layer and every major design pattern in the codebase.
---
### D5: Service deep-dives use a consistent template
**Decision**: Each service deep-dive follows the structure: Purpose → Responsibility boundary → Interface → Key methods → Database schema (if applicable) → Error types → Configuration.
**Rationale**: Consistency reduces cognitive load. An engineer who has read the AgentService deep-dive knows exactly where to look for the same information in the OAuth2Service deep-dive. The template mirrors SOLID's Single Responsibility — each section answers one question.
---
### D6: Engineering workflow doc is prescriptive, not descriptive
**Decision**: The workflow guide tells engineers exactly what to do step by step, not just what the process is.
**Rationale**: Engineers with 1yr experience have worked in teams but may not have used a spec-first workflow before. A prescriptive guide ("Step 1: run `openspec new change <name>`") reduces ambiguity and enforces our standards from day one.
## Risks / Trade-offs
**[Line numbers drift as code evolves]** → Walkthroughs include a "last verified against commit X" header. The CTO assigns a quarterly walkthrough review task in each Phase change.
**[Docs can become stale if not maintained]** → Each document has a "Last updated" field in its header. The engineering workflow guide explicitly requires updating relevant engineering docs as part of any PR that changes architecture or public service interfaces.
**[Scope is large — ~15 documents, ~10,000 lines]** → Tasks are broken into discrete documents, each independently completable. No document depends on another being written first (only the index depends on all others).
## Migration Plan
1. Create `docs/engineering/` directory
2. Write all 15 documents (10 capability areas, some split across multiple files)
3. Write `docs/engineering/README.md` index with links and reading order
4. Commit all to `develop` in a single commit
5. No existing documentation is modified or removed
No rollback required — this is additive only.
## Open Questions
_(none — all decisions made above; scope fully defined in proposal)_

View File

@@ -0,0 +1,42 @@
## Why
SentryAgent.ai is growing and hiring engineers with BSc Computer Science and one year of industrial experience. There are currently no internal engineering documents that explain how the system works from the top down — new engineers have no structured path from product vision to running code, and no reference for how to contribute correctly. This gap slows onboarding, increases mistakes, and risks divergence from our architecture and standards.
## What Changes
- New `docs/engineering/` directory added to the repository as the canonical engineering knowledge base
- Top-down documentation suite covering all layers of the system: company vision → architecture → codebase → services → workflows → operations
- Annotated code walkthroughs for the three most critical system flows (token issuance, agent registration, credential rotation)
- Development environment setup guide targeting < 30 minutes from clone to running local stack
- Engineering workflow guide covering the full OpenSpec → Architect → Developer → QA → merge cycle
- Service deep-dive documents for all 8 core services/components
- SDK integration guide covering all four language SDKs
- Testing strategy and quality gate reference
- Deployment and operations reference covering Docker, Terraform, and monitoring
## Capabilities
### New Capabilities
- `engineering-overview`: Company mission, product vision, system purpose, and how the engineering team operates — the entry point for all new hires
- `architecture-guide`: System architecture including component diagram, data flow diagrams, deployment topology, and technology stack rationale (ADRs)
- `codebase-structure`: Annotated directory map explaining every top-level directory and key file, what lives where and why
- `service-deep-dives`: Per-service documentation for AgentService, OAuth2Service, CredentialService, AuditService, VaultClient, OPA policy engine, Web Dashboard, and Prometheus/Grafana monitoring
- `code-walkthroughs`: Step-by-step annotated traces of the three critical flows: token issuance end-to-end, agent registration end-to-end, credential rotation end-to-end
- `dev-environment-setup`: Local development environment setup — prerequisites, clone, configure, Docker Compose up, smoke test — targeting < 30 minutes
- `engineering-workflow`: How to contribute — OpenSpec spec-first workflow, branching strategy, PR standards, quality gates, and the role of each virtual engineering team member
- `testing-strategy`: Test framework, test types (unit vs integration), coverage gates, how to run tests, and how to write new tests following project conventions
- `deployment-operations`: Docker build and run, Terraform multi-region deployment, environment configuration, Prometheus/Grafana monitoring, and operational runbooks
- `sdk-guide`: Integration guide for Node.js, Python, Go, and Java SDKs — installation, authentication, all major operations, error handling
### Modified Capabilities
_(none — this change adds documentation only; no existing spec-level behavior changes)_
## Impact
- **Repository**: New `docs/engineering/` directory (~15 documents, ~10,000 lines of markdown)
- **No code changes**: Documentation only — zero impact on `src/`, `tests/`, `sdk/`, or infrastructure
- **Dependencies**: None — no new packages required
- **APIs**: No API changes
- **Existing docs**: `docs/developers/` (bedroom developer guide) and `docs/devops/` (operations) remain unchanged; this is an additive engineering-internal knowledge base

View File

@@ -0,0 +1,35 @@
## ADDED Requirements
### Requirement: System architecture document
The system SHALL include a document (`docs/engineering/02-architecture.md`) that describes the full system architecture: components, their responsibilities, how they communicate, and the deployment topology.
#### Scenario: Component diagram present
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL find an ASCII or Mermaid component diagram showing all major components (API server, PostgreSQL, Redis, Vault, OPA, Web Dashboard, Prometheus, Grafana) and their connections
#### Scenario: Request lifecycle explained
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand how an incoming HTTP request flows from client → Express router → middleware chain → controller → service → repository → database and back
#### Scenario: Data flow for authentication described
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the OAuth 2.0 Client Credentials flow: client presents credentials → token service validates → Redis checked for existing token → JWT signed and returned
#### Scenario: Deployment topology covered
- **WHEN** a new engineer reads 02-architecture.md
- **THEN** they SHALL understand the multi-region deployment model (US, EU, APAC) and how Terraform provisions it
### Requirement: Technology stack and ADR document
The system SHALL include a document (`docs/engineering/03-tech-stack.md`) that lists every technology in the stack and explains why it was chosen over alternatives.
#### Scenario: Every major technology documented with rationale
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL find an entry for each technology (Node.js 18, TypeScript 5.3, Express 4.18, PostgreSQL 14, Redis 7, HashiCorp Vault, OPA, React 18, Vite 5, Prometheus, Grafana, Terraform) with: what it does in the system, why it was chosen, and what was considered but rejected
#### Scenario: TypeScript strict mode rationale explained
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand why strict mode is mandatory (safety, correctness, no implicit any) and what the consequences of violating it are
#### Scenario: PostgreSQL vs Redis responsibility boundary clear
- **WHEN** a new engineer reads 03-tech-stack.md
- **THEN** they SHALL understand what is stored in PostgreSQL (persistent state: agents, credentials, audit logs) vs Redis (ephemeral state: active tokens, rate limit counters)

View File

@@ -0,0 +1,27 @@
## ADDED Requirements
### Requirement: Annotated code walkthrough documents
The system SHALL include a document (`docs/engineering/06-walkthroughs.md`) containing three annotated end-to-end walkthroughs of the system's critical flows, with file:line references to actual source code.
#### Scenario: Token issuance walkthrough complete
- **WHEN** a new engineer reads the token issuance walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /oauth2/token → Express router → auth middleware → OAuth2Controller → OAuth2Service → CredentialRepository → Vault/bcrypt credential check → Redis token cache check → JWT signing (src/utils/jwt.ts) → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number where it occurs
#### Scenario: Agent registration walkthrough complete
- **WHEN** a new engineer reads the agent registration walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents → auth middleware → validation middleware → AgentController → AgentService.createAgent → input validation (src/utils/validators.ts) → AgentRepository.create → PostgreSQL INSERT → AuditService.logEvent → HTTP 201 response with agent object
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Credential rotation walkthrough complete
- **WHEN** a new engineer reads the credential rotation walkthrough
- **THEN** they SHALL be guided step by step from: HTTP POST /agents/:id/credentials/:credId/rotate → auth middleware → CredentialController → CredentialService.rotateCredential → old credential revocation → new secret generation (src/utils/crypto.ts) → Vault write or bcrypt hash → CredentialRepository.update → token revocation for old credentials → AuditService.logEvent → HTTP 200 response
- **AND** every step SHALL reference the actual file and line number
#### Scenario: Walkthroughs include version reference
- **WHEN** a new engineer reads any walkthrough
- **THEN** the document SHALL include a header stating the commit hash it was last verified against, so engineers know if the walkthrough may have drifted from the current code
#### Scenario: Each walkthrough annotates why, not just what
- **WHEN** a new engineer reads a walkthrough step
- **THEN** each step SHALL explain not just what the code does but WHY — e.g., why Redis is checked before signing a new JWT, why constant-time comparison is used for credential verification, why audit logging happens after persistence not before

View File

@@ -0,0 +1,24 @@
## ADDED Requirements
### Requirement: Codebase structure document
The system SHALL include a document (`docs/engineering/04-codebase-structure.md`) that provides an annotated map of every top-level directory and key file in the repository, explaining what lives where and why.
#### Scenario: Full directory tree annotated
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find an annotated directory tree covering: `src/`, `tests/`, `docs/`, `sdk/`, `sdk-python/`, `sdk-go/`, `sdk-java/`, `terraform/`, `dashboard/`, `migrations/`, `openspec/`, `scripts/`
#### Scenario: src/ subdirectory roles explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand the role of each `src/` subdirectory: `controllers/` (HTTP layer), `services/` (business logic), `repositories/` (data access), `middleware/` (cross-cutting concerns), `utils/` (shared utilities), `types/` (TypeScript interfaces), `routes/` (Express router definitions)
#### Scenario: Where to add new code explained
- **WHEN** a new engineer needs to add a new feature
- **THEN** the document SHALL tell them exactly where each type of code belongs: new endpoint → controller + route; new business logic → service; new DB query → repository; new shared utility → utils/
#### Scenario: Key files identified and explained
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL find explanations of: `src/app.ts` (Express app setup), `src/server.ts` (entry point), `src/types/index.ts` (canonical type definitions), `src/utils/errors.ts` (error hierarchy), `docker-compose.yml` (local dev stack), `tsconfig.json` (TypeScript config)
#### Scenario: DRY principle mapped to structure
- **WHEN** a new engineer reads 04-codebase-structure.md
- **THEN** they SHALL understand how the directory structure enforces DRY: one location for types, one for crypto utilities, one for JWT utilities, one for validators — and why duplication across these is a blocking PR issue

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Deployment and operations guide
The system SHALL include a document (`docs/engineering/10-deployment.md`) that explains how the application is built, deployed, and operated — covering Docker, Terraform, environment configuration, and monitoring.
#### Scenario: Docker build and run documented
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the multi-stage Dockerfile (builder stage compiles TypeScript, production stage runs compiled JS with node:18-alpine and non-root USER node), how to build the image, and how to run it with the required environment variables
#### Scenario: Environment variables fully documented
- **WHEN** a new engineer needs to configure the application
- **THEN** the guide SHALL provide a complete table of all environment variables: name, purpose, required/optional, example value — covering database, Redis, JWT signing key, Vault, OPA, and rate limiting config
#### Scenario: Database migrations documented
- **WHEN** a new engineer needs to run or write migrations
- **THEN** the guide SHALL explain: where migration files live (`migrations/`), the naming convention, how to run them (`npm run migrate`), and how to write a new migration following the existing pattern
#### Scenario: Terraform multi-region deployment explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL understand the Terraform structure: what modules exist, what the three regions (US, EU, APAC) deploy, how to run `terraform plan` and `terraform apply`, and what AWS/GCP resources are provisioned
#### Scenario: Prometheus metrics and Grafana explained
- **WHEN** a new engineer reads 10-deployment.md
- **THEN** they SHALL find: which endpoint exposes metrics (`/metrics`), the key metrics tracked, how to access the Grafana dashboard locally (port, login), and how to add a new metric counter or histogram to the API server
#### Scenario: Operational runbook for common tasks
- **WHEN** a new engineer is on-call or supporting operations
- **THEN** the guide SHALL include a runbook covering: how to check application health, how to rotate the JWT signing key, how to revoke all tokens for a compromised agent, and how to read audit logs for an incident

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Development environment setup guide
The system SHALL include a document (`docs/engineering/07-dev-setup.md`) that takes a new engineer from zero to a fully running local stack in under 30 minutes, with no prior knowledge of the project assumed.
#### Scenario: Prerequisites listed completely
- **WHEN** a new engineer reads 07-dev-setup.md
- **THEN** they SHALL find a complete prerequisites list: Node.js 18+, Docker Desktop, Git, a PostgreSQL client (optional), and links to install each — with no undocumented dependencies
#### Scenario: Repository clone and setup steps complete
- **WHEN** a new engineer follows the clone and setup steps
- **THEN** they SHALL be able to: clone the repo, copy `.env.example` to `.env`, run `npm install`, and have all dependencies installed with zero manual configuration
#### Scenario: Docker Compose local stack starts successfully
- **WHEN** a new engineer runs `docker-compose up -d`
- **THEN** all services (PostgreSQL, Redis, API server) SHALL start, migrations SHALL run automatically, and the guide SHALL show how to verify each service is healthy
#### Scenario: Smoke test confirms working stack
- **WHEN** a new engineer follows the smoke test section
- **THEN** they SHALL run a curl command to POST /oauth2/token with the seed credentials and receive a valid JWT — confirming the full stack is operational
#### Scenario: Common setup errors documented
- **WHEN** a new engineer encounters a setup error
- **THEN** the guide SHALL include a troubleshooting section covering the 5 most common errors: port already in use, migration failure, Node version mismatch, Docker not running, and missing .env variables
#### Scenario: Running tests locally documented
- **WHEN** a new engineer wants to run the test suite
- **THEN** the guide SHALL show: `npm test` (unit tests only, no services needed), `npm run test:integration` (requires Docker stack), and how to run a single test file
#### Scenario: Web dashboard local development documented
- **WHEN** a new engineer wants to run the web dashboard
- **THEN** the guide SHALL show how to start the Vite dev server (`npm run dev` in `dashboard/`) and which port it runs on, and confirm it connects to the local API server

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: Company and product overview document
The system SHALL include a document (`docs/engineering/01-overview.md`) that explains SentryAgent.ai's mission, the AgentIdP product, target users, and why the product exists — providing new engineers with business and product context before they read any technical content.
#### Scenario: Mission and vision covered
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what SentryAgent.ai builds, why it exists, and what problem it solves for AI developers
#### Scenario: AGNTCY alignment explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand what AGNTCY is, why SentryAgent.ai aligns to it, and what "first-class agent identity" means
#### Scenario: Product features listed
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see a summary of all product capabilities: agent registry, OAuth 2.0 auth, credential management, audit logs, SDKs, web dashboard, policy engine, and monitoring
#### Scenario: Phase roadmap visible
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand which capabilities belong to Phase 1, Phase 2, and Phase 3
#### Scenario: Engineering team structure explained
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL understand the Virtual Engineering Team model (CTO → Architect → Developer → QA) and how Claude operates as the engineering partner
#### Scenario: Free tier limits documented
- **WHEN** a new engineer reads 01-overview.md
- **THEN** they SHALL see the free tier limits (100 agents, 10,000 token requests/month, 90-day audit retention, 100 req/min) and understand the product's positioning

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Engineering workflow and contribution guide
The system SHALL include a document (`docs/engineering/08-workflow.md`) that prescribes the exact steps an engineer MUST follow to contribute any new feature or change, from idea to merged code.
#### Scenario: OpenSpec spec-first workflow explained
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand that NO implementation begins without an approved OpenAPI spec — and the exact sequence: CEO approves → Architect writes spec → CTO reviews → Developer implements → QA signs off → CEO approves merge
#### Scenario: OpenSpec CLI commands documented
- **WHEN** a new engineer wants to start a new change
- **THEN** the guide SHALL provide the exact commands: `openspec new change <name>`, `openspec status --change <name>`, `openspec instructions <artifact> --change <name>`, and what each command does
#### Scenario: Branching strategy documented
- **WHEN** a new engineer creates a branch
- **THEN** the guide SHALL prescribe: feature branches from `develop`, naming convention `feature/<change-name>`, PR targets `develop`, `develop``main` requires CTO + CEO approval
#### Scenario: TypeScript and code standards enforced in workflow
- **WHEN** a new engineer writes code
- **THEN** the guide SHALL state the non-negotiable standards: strict mode, no `any`, DRY, SOLID, JSDoc on all public methods — and that PRs violating these are blocked by the CTO regardless of functionality
#### Scenario: PR checklist documented
- **WHEN** a new engineer opens a PR
- **THEN** the guide SHALL provide a PR checklist: TypeScript compiles with zero errors, ESLint passes with zero warnings, unit tests pass, coverage gate met (>80%), integration tests pass, OpenAPI spec updated if endpoint changed, engineering docs updated if architecture changed
#### Scenario: Virtual engineering team roles explained for contributors
- **WHEN** a new engineer reads 08-workflow.md
- **THEN** they SHALL understand the role separation: they contribute as the Principal Developer role, the CTO reviews all PRs, the Architect owns spec changes, and QA owns the test sign-off — and how to interact with each role in practice
#### Scenario: Commit message conventions documented
- **WHEN** a new engineer writes a commit message
- **THEN** the guide SHALL prescribe the Conventional Commits format: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` prefixes — with examples for each

View File

@@ -0,0 +1,28 @@
## ADDED Requirements
### Requirement: SDK integration guide
The system SHALL include a document (`docs/engineering/11-sdk-guide.md`) that explains how each of the four language SDKs is structured, how to use them, and how to contribute to or extend them.
#### Scenario: SDK architecture overview present
- **WHEN** a new engineer reads 11-sdk-guide.md
- **THEN** they SHALL understand that all four SDKs (Node.js, Python, Go, Java) implement the same API surface (14 endpoints, 4 service clients, 1 TokenManager, 1 error type) with identical semantics, and why consistency across SDKs is a non-negotiable standard
#### Scenario: Node.js SDK documented
- **WHEN** a new engineer reads the Node.js SDK section
- **THEN** they SHALL find: installation (`npm install @sentryagent/idp-sdk`), the AgentIdPClient constructor, all 4 service clients (agents, credentials, tokens, audit), TokenManager auto-refresh behaviour, AgentIdPError structure, and a complete working code example for the most common flow (register agent → generate credential → issue token)
#### Scenario: Python SDK documented
- **WHEN** a new engineer reads the Python SDK section
- **THEN** they SHALL find: installation (`pip install sentryagent-idp`), both sync (AgentIdPClient) and async (AsyncAgentIdPClient) variants, TokenManager and AsyncTokenManager auto-refresh, AgentIdPError, and a complete working example for sync and async usage
#### Scenario: Go SDK documented
- **WHEN** a new engineer reads the Go SDK section
- **THEN** they SHALL find: installation (`go get github.com/sentryagent/idp-sdk-go`), AgentIdPClient construction, goroutine-safe TokenManager, context.Context usage pattern, AgentIdPError with Code/HTTPStatus/Details, and a complete working example
#### Scenario: Java SDK documented
- **WHEN** a new engineer reads the Java SDK section
- **THEN** they SHALL find: Maven/Gradle dependency snippet, AgentIdPClient construction with builder pattern, sync methods and CompletableFuture async counterparts, thread-safe TokenManager, AgentIdPException, and a complete working example
#### Scenario: SDK contribution guide included
- **WHEN** a new engineer needs to add a new endpoint to all SDKs
- **THEN** the guide SHALL provide a step-by-step checklist for adding a new method to all four SDKs consistently: where to add the method, what the signature pattern is, how to write the corresponding test, and how to verify it compiles/passes in each language

View File

@@ -0,0 +1,40 @@
## ADDED Requirements
### Requirement: Service deep-dive documents
The system SHALL include a document (`docs/engineering/05-services.md`) providing a deep-dive reference for every core service and component, following a consistent template: Purpose → Responsibility boundary → Public interface → Key methods → Database schema (if applicable) → Error types → Configuration.
#### Scenario: AgentService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AgentService section covering: responsibility (agent CRUD only), public methods (createAgent, getAgent, listAgents, updateAgent, deleteAgent), the `agents` table schema, AgentNotFoundError and AgentAlreadyExistsError, and what AgentService does NOT do (no auth, no credentials — Single Responsibility)
#### Scenario: OAuth2Service documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OAuth2Service section covering: responsibility (token issuance and revocation only), public methods (issueToken, validateToken, revokeToken), Redis token storage schema, JWT payload structure, token TTL configuration, and the Vault credential verification path vs bcrypt path
#### Scenario: CredentialService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the CredentialService section covering: responsibility (credential lifecycle only), public methods (generateCredential, rotateCredential, revokeCredential, listCredentials), the `credentials` table schema, bcrypt vs Vault storage decision, and the `vault_path` column purpose
#### Scenario: AuditService documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the AuditService section covering: responsibility (immutable audit logging only), public methods (logEvent, queryLogs), the `audit_logs` table schema, event types enum, 90-day retention policy, and why audit records are never updated or deleted
#### Scenario: VaultClient documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the VaultClient section covering: purpose (wraps node-vault for KV v2 operations), public methods (writeSecret, readSecret, verifySecret, deleteSecret), the opt-in configuration (VAULT_ADDR env var), and the constant-time comparison in verifySecret and why it matters (timing attack prevention)
#### Scenario: OPA policy engine documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the OPA section covering: purpose (dynamic access control beyond static OAuth scopes), how policies are loaded, how authorization decisions are made, the policy file locations, and how to write and test a new policy
#### Scenario: Web Dashboard documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the Web Dashboard section covering: React 18 + Vite 5 + TypeScript structure, how it authenticates against the AgentIdP API, the main views (agent list, credential management, audit log viewer, policy editor), and how to run it locally
#### Scenario: Monitoring stack documented
- **WHEN** a new engineer reads 05-services.md
- **THEN** they SHALL find the monitoring section covering: Prometheus metrics exposed by the API server (`/metrics`), the key metrics (request count, latency histograms, active tokens, agent count), Grafana dashboard structure, and how to add a new metric to the API server
#### Scenario: Consistent template enforced
- **WHEN** a new engineer looks up any service
- **THEN** every service section SHALL follow the same template so the engineer knows exactly where to find each type of information

View File

@@ -0,0 +1,32 @@
## ADDED Requirements
### Requirement: Testing strategy document
The system SHALL include a document (`docs/engineering/09-testing.md`) that explains the test architecture, how to run tests, coverage requirements, and how to write new tests following project conventions.
#### Scenario: Test types and their purposes explained
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL understand the distinction between: unit tests (test one service/util in isolation, mock all dependencies, no running services needed) and integration tests (test full HTTP request/response cycle with real PostgreSQL + Redis)
#### Scenario: Test framework stack documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL find the test stack listed and explained: Jest 29.7 (test runner + assertions), ts-jest (TypeScript compilation), Supertest 6.3 (HTTP integration testing), and how each is configured
#### Scenario: Coverage gates documented
- **WHEN** a new engineer reads 09-testing.md
- **THEN** they SHALL know the mandatory gates: >80% statements, >80% branches, >80% functions, >80% lines — and that PRs below these thresholds are blocked
#### Scenario: How to run the test suite documented
- **WHEN** a new engineer wants to run tests
- **THEN** the guide SHALL show: `npm test` (unit tests, no services), `npm run test:coverage` (unit tests + coverage report), `npm run test:integration` (requires Docker stack), and `npx jest src/services/agentService.test.ts` (single file)
#### Scenario: Unit test writing conventions shown
- **WHEN** a new engineer writes a new unit test
- **THEN** the guide SHALL show a complete example: how to mock a repository with `jest.mock()`, how to structure `describe`/`it` blocks, how to assert on thrown errors, and how to verify mock calls — using an actual test from the codebase as the example
#### Scenario: Integration test writing conventions shown
- **WHEN** a new engineer writes a new integration test
- **THEN** the guide SHALL show a complete example using Supertest: how to boot the Express app, how to seed test data, how to make authenticated requests (including getting a JWT first), and how to clean up after the test
#### Scenario: OWASP security testing reference included
- **WHEN** a new engineer writes security-relevant code
- **THEN** the guide SHALL include a reference to the OWASP Top 10 checks that are verified in QA sign-off and what each means in the context of this codebase (SQL injection, JWT attacks, credential exposure, etc.)

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-29

View File

@@ -0,0 +1,269 @@
# Phase 3: Enterprise — Technical Design
**Date**: 2026-03-29
**Author**: Virtual Architect
**Status**: Draft — pending CEO approval of proposal
---
## Architecture Overview
Phase 3 transforms AgentIdP from a single-tenant OAuth 2.0 server into a multi-tenant, W3C DID-issuing, OIDC-compliant, federated enterprise identity platform. The architecture remains monolithic Express (no microservices split) to avoid operational complexity, but clear service boundaries are enforced internally.
```
┌──────────────────────────────────────────────────────┐
│ AgentIdP Server (Express) │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Middleware Stack (ordered) │ │
│ │ TLS Enforcement → Auth → Org Context → OPA │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────┐ │
│ │ OrgSvc │ │ DIDSvc │ │OIDCSvc │ │FedSvc │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ WebhookQ │ │ SOC2Ctrl │ │
│ └──────────┘ └──────────┘ │
└──────────────────────────────────────────────────────┘
│ │ │
┌────────▼──┐ ┌─────▼───┐ ┌──▼──────────┐
│PostgreSQL │ │ Redis │ │ Vault │
│(org rows) │ │(webhook │ │ (secrets) │
└───────────┘ │ queue) │ └─────────────┘
└─────────┘
```
---
## Architectural Decision Records
---
### D1: Multi-Tenancy Model
**Status**: Accepted
**Decision**: Row-level tenancy — add `organization_id` (UUID, NOT NULL) to every domain table. No schema-per-tenant, no database-per-tenant.
**Rationale**: Row-level tenancy is operationally the simplest approach: a single database, a single schema, a single connection pool. All queries are augmented with an `organization_id` filter extracted from the authenticated JWT. PostgreSQL Row-Level Security (RLS) is enabled on all tenant-scoped tables as a defense-in-depth measure — even if the application filter is accidentally omitted, the database enforces isolation.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Schema-per-tenant | Strong isolation, independent migrations | Complex migration tooling, connection pool explosion at scale | Operational overhead exceeds threat model requirement |
| Database-per-tenant | Maximum isolation | Separate connection pool, backup, monitoring per tenant | Prohibitive at 100+ orgs; overkill for our threat model |
| Row-level (chosen) | Simple, fast, single migration path | RLS must be enforced consistently | Chosen — enforce via both application and RLS |
**Consequences**:
- Every domain table gets an `organization_id` column and a corresponding index
- All service methods accept `organizationId: string` as a required parameter
- JWT payload extended to include `organization_id` claim
- Existing single-tenant data migrated to a default `system` organization
- PostgreSQL RLS policies written for all tenant tables
---
### D2: DID Method Selection
**Status**: Accepted
**Decision**: `did:web` — DID Documents served over HTTPS at well-known and per-agent URLs.
**Rationale**: `did:web` requires no blockchain, no ledger, and no external infrastructure beyond the HTTPS server already running. It is W3C DID Core 1.0 compliant, supported by all major DID resolvers, and is the preferred method for enterprise deployments where an organization controls its own domain. It aligns directly with the `did:web` identifier scheme used in AGNTCY agent card specifications.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| `did:web` (chosen) | No blockchain, HTTPS-based, enterprise-friendly | DID tied to domain; moving the domain invalidates DIDs | Accepted tradeoff — enterprise deployments have stable domains |
| `did:key` | Self-contained, no infrastructure | Not anchored — anyone can generate any `did:key`; no discovery | No trust anchor; not suitable for enterprise identity |
| `did:ethr` | Ethereum-anchored, decentralized | Blockchain dependency, gas costs, not enterprise-standard | Blockchain dependency is a non-starter for regulated enterprises |
**Consequences**:
- DID for the AgentIdP instance: `did:web:<hostname>`
- DID for an agent: `did:web:<hostname>:agents:<agentId>`
- DID Documents served at `/.well-known/did.json` and `/agents/:id/did`
- Domain change requires DID migration — document this in ops runbook
---
### D3: OIDC Library Selection
**Status**: Accepted
**Decision**: `oidc-provider` npm package — a certified, RFC-compliant OIDC server library.
**Rationale**: `oidc-provider` is the most widely deployed Node.js OIDC library, passing the OpenID Foundation's official conformance test suite. Building OIDC from scratch on top of our existing JWT infrastructure would require implementing Discovery, JWKS rotation, ID token construction, and claim aggregation correctly against multiple RFCs. The certified library eliminates that risk and reduces implementation surface area. It integrates cleanly with Express as a mounted middleware.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| `oidc-provider` (chosen) | Certified, RFC-complete, actively maintained | Adds a significant dependency | Risk of non-compliance from custom implementation outweighs dependency cost |
| Custom JWT extension | Full control, no new dependency | High risk of spec deviation; ID token, Discovery, JWKS are complex | RFC compliance cannot be self-certified |
| `keycloak` sidecar | Battle-tested, full-featured | Heavyweight Java service; architectural mismatch | Not Node.js; adds operational complexity |
**Consequences**:
- `oidc-provider` is mounted at `/oidc` in Express
- OIDC Discovery served at `/.well-known/openid-configuration` (proxied from oidc-provider)
- JWKS served at `/.well-known/jwks.json`
- Adapter written to store OIDC sessions in Redis (oidc-provider's adapter interface)
- Existing `POST /oauth2/token` route extended, not replaced — maintains backward compatibility
---
### D4: Federation Protocol
**Status**: Accepted
**Decision**: Signed JWT assertions — remote AgentIdP instances present a signed JWT; the receiving instance verifies the signature against the registered JWKS of the issuing instance.
**Rationale**: JWT assertion federation reuses the existing JWT infrastructure (`jsonwebtoken`, JWKS endpoint from OIDC workstream). No new protocol is introduced. The trust model is explicit: operators register partner instances with their JWKS URL. This aligns with RFC 7523 (JWT Profile for OAuth 2.0 Client Authentication) and the AGNTCY inter-agent trust model.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Signed JWT assertions (chosen) | Uses existing JWT infra, explicit trust registry, RFC-aligned | JWKS URL must be reachable at verification time | Acceptable operational constraint; JWKS can be cached |
| mTLS | Strong cryptographic identity | Certificate management overhead, PKI required per partner | Cert management complexity not justified when JWT assertions suffice |
| AGNTCY-specific protocol | Native alignment | Spec still evolving; risk of churn | Build on stable JWT base; adapt to AGNTCY extensions as spec matures |
**Consequences**:
- New `federation_partners` table: `id`, `name`, `jwks_url`, `issuer`, `trusted_since`, `organization_id`
- JWKS of partner instances cached in Redis with TTL
- `POST /federation/verify` accepts a bearer token from a remote instance and returns verification result
- Federation tokens are not accepted for agent management endpoints — only for identity assertion
---
### D5: Webhook Delivery Architecture
**Status**: Accepted
**Decision**: Async delivery via Redis-backed `bull` queue with exponential backoff retry (max 10 attempts over 24 hours).
**Rationale**: Synchronous webhook delivery from within a request handler would add latency and create tight coupling between event generation and delivery outcome. The Redis queue (`bull`) decouples delivery: events are enqueued immediately, a background worker delivers them. `bull` provides built-in retry, delay, and failure tracking without introducing a new infrastructure component (Redis is already present). HMAC-SHA256 signing on every delivery allows recipients to verify authenticity.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Redis queue via `bull` (chosen) | Reuses existing Redis, retry built-in, low operational overhead | Delivery tied to Redis availability | Acceptable — Redis is already a required dependency |
| Synchronous in-request delivery | Simplest implementation | Adds latency to event-generating requests; failure blocks response | Unacceptable latency and coupling |
| Dedicated message broker (RabbitMQ) | Robust, durable | New infrastructure dependency | Operational overhead; Redis already present |
| Kafka (primary) | High-throughput, durable | Overkill for webhook delivery; complex operations | Optional adapter only; not primary delivery mechanism |
**Consequences**:
- New `webhook_subscriptions` and `webhook_deliveries` tables
- `bull` worker process runs in same Node.js instance (separate worker thread via `bull`)
- Retry schedule: 1m, 5m, 15m, 1h, 4h, 12h, 24h (exponential backoff)
- Failed delivery after 10 attempts moves to dead-letter; operator alerted
- Optional Kafka adapter: if `KAFKA_BROKERS` env var is set, events are also produced to Kafka
---
### D6: SOC 2 Scope
**Status**: Accepted
**Decision**: Target SOC 2 Type II (operational, not just design). All controls implemented in code. Audit period: 6 months post-Phase 3 launch.
**Rationale**: SOC 2 Type I certifies that controls are designed correctly. SOC 2 Type II certifies that they operate continuously over a period of time. Enterprise customers in regulated industries (finance, healthcare, government) require Type II. Implementing the controls now, with the 6-month operational window beginning at Phase 3 launch, puts us on the fastest possible path to Type II certification.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Type II from launch (chosen) | Satisfies enterprise requirements | Requires 6-month operation window | Accepted — the controls are implemented in Phase 3; audit window starts after launch |
| Type I only | Faster to certify | Not accepted by most enterprise procurement | Insufficient for target customers |
| ISO 27001 instead | International standard | Larger scope, longer implementation | SOC 2 is standard for US market; add ISO 27001 in Phase 4 |
**Consequences**:
- Encryption at rest: `pgcrypto` extension for column-level encryption on `credentials.secret_hash` and `credentials.vault_path`
- TLS enforcement: Express middleware rejects HTTP requests (not HTTPS) in production
- Secrets rotation: cron-based job that triggers credential rotation reminders and Vault lease renewals
- Security alerting: Prometheus alerting rules for auth failure spikes, rate limit exhaustion, anomalous token issuance
- Audit log immutability: Merkle hash chain (each row's hash includes the previous row's hash)
---
### D7: Audit Log Immutability — Merkle Hash Chain
**Status**: Accepted
**Decision**: Each `audit_logs` row carries a `hash` field: `SHA-256(eventId + timestamp + action + outcome + agentId + previousHash)`. The chain starts with a genesis hash. Verification is a sequential pass over all rows in insertion order.
**Rationale**: Append-only logs in PostgreSQL can be altered by a DBA with sufficient access. A Merkle-style hash chain makes tampering detectable without requiring a blockchain. Any modification to a historical row breaks the chain from that point forward. Verification is a simple sequential computation that can be run on demand or as a scheduled integrity check.
**Alternatives Considered**:
| Option | Pros | Cons | Rejected because |
|--------|------|------|-----------------|
| Merkle hash chain in PostgreSQL (chosen) | No new infra, tamper-evident, verifiable | DBA can re-compute hashes after tampering if they control the algorithm | Acceptable — threat model is accidental/low-sophistication modification; cryptographic chain deters opportunistic tampering |
| Blockchain anchor | Cryptographically immutable | Blockchain dependency, cost, latency | Excessive for current threat model |
| Write-once S3/GCS export | External immutability | Delayed; operational complexity | Added complexity; hash chain provides continuous coverage |
**Consequences**:
- New `hash` (VARCHAR 64) and `previous_hash` (VARCHAR 64) columns on `audit_logs`
- `AuditService.create()` computes hash before insert — adds ~1ms latency per audit event
- New `GET /audit/verify` endpoint: returns chain integrity status (admin only)
- `audit_logs` table has an `INSERT`-only trigger that prevents `UPDATE` and `DELETE` via PostgreSQL trigger
---
### D8: Organization Context in JWT
**Status**: Accepted
**Decision**: Add `organization_id` claim to JWT access tokens issued by `POST /oauth2/token`. All downstream middleware extracts `organization_id` from the token — no separate lookup required.
**Rationale**: Including `organization_id` in the JWT keeps the middleware stack stateless. The alternative — looking up the organization from the database on every request — adds latency and a database round-trip to every authenticated call. The JWT is already signed; adding a claim costs nothing cryptographically.
**Consequences**:
- `ITokenPayload` interface extended: `organization_id: string`
- All service methods receive `organizationId` from `req.user.organization_id`
- Token introspection response includes `organization_id`
- Agents registered before multi-tenancy belong to the default `system` organization
---
## Component Interaction Map (Phase 3)
```
┌──────────────────────┐
│ Web Dashboard │
│ (+ Org Mgmt pages) │
└──────────┬───────────┘
│ HTTPS
┌───────────────────────▼─────────────────────────────┐
│ AgentIdP Server │
│ │
│ TLS MW → Auth MW → OrgContext MW → OPA MW │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ │
│ │ OrgService│ │DIDService │ │ OIDCProvider │ │
│ └───────────┘ └───────────┘ │ (oidc-provider) │ │
│ ┌───────────┐ ┌───────────┐ └───────────────────┘ │
│ │ FedService│ │WebhookSvc │ │
│ └───────────┘ └───────────┘ │
│ ┌─────────────────────────┐ │
│ │ SOC2Controls (cross-cut)│ │
│ └─────────────────────────┘ │
└──────────┬──────────────┬──────────────┬─────────────┘
│ │ │
┌────────▼──┐ ┌───────▼──┐ ┌──────▼──────┐
│PostgreSQL │ │ Redis │ │ Vault │
│ + RLS │ │ +bull Q │ │ (secrets) │
└───────────┘ └──────────┘ └─────────────┘
┌────────▼──────┐
│ Prometheus │
│ + Alerting │
└────────┬──────┘
┌────────▼──────┐
│ Grafana │
└───────────────┘
```

View File

@@ -0,0 +1,165 @@
# Phase 3: Enterprise — Change Proposal
**Date**: 2026-03-29
**Author**: Virtual Architect
**Status**: Proposed — awaiting CEO approval
---
## Summary
Phase 1 delivered a complete, working AgentIdP MVP. Phase 2 made it production-ready: Vault-backed secrets, multi-language SDKs, OPA policy engine, React dashboard, Prometheus/Grafana observability, and multi-region Terraform deployment. Phase 3 makes AgentIdP enterprise-grade: the platform moves from a single-tenant developer tool to a multi-tenant enterprise identity platform with W3C DID support, OIDC compliance, AGNTCY federation, real-time event streaming, and SOC 2 Type II controls.
---
## Problem Statement
Phase 1 and Phase 2 are functional and production-ready but have the following enterprise gaps:
| Gap | Risk |
|-----|------|
| Single-tenant architecture | Cannot serve enterprise customers with isolated data requirements |
| No W3C DID support | Not fully AGNTCY-compliant; agents lack interoperable decentralized identifiers |
| OAuth 2.0 only, no OIDC | Cannot integrate with standard enterprise identity ecosystems (SSO, SCIM) |
| No cross-instance federation | Multi-organization agent identity cannot be verified across AgentIdP deployments |
| No webhook/event streaming | Operators cannot react to agent lifecycle events in real time |
| No SOC 2 controls | Cannot pass enterprise security reviews; blocks revenue from regulated industries |
---
## Proposed Changes
### 1. Multi-Tenancy
Introduce an Organization model so a single AgentIdP instance can serve multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit log, and rate limits. A new Admin API provides organization lifecycle management. All existing agent, credential, and audit endpoints become organization-scoped.
### 2. W3C Decentralized Identifiers (DIDs)
Issue a W3C `did:web` identifier for every registered agent. Serve DID Documents at `/.well-known/did.json` (instance root) and `/agents/:id/did` (per-agent). Expose a DID resolution endpoint. Produce AGNTCY-format agent cards from DID Documents.
### 3. AGNTCY Federation
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted federation partners. Tokens issued by a trusted remote AgentIdP instance can be verified locally, enabling multi-organization and cross-enterprise agent identity interoperability aligned with AGNTCY standards.
### 4. OpenID Connect (OIDC)
Add a full OIDC layer on top of the existing OAuth 2.0 implementation using the `oidc-provider` certified library. Exposes OIDC Discovery, JWKS, ID tokens with agent claims, and an `/agent-info` endpoint (the agent-identity equivalent of the OIDC `/userinfo` endpoint).
### 5. Webhooks and Event Streaming
Real-time event notifications for all agent lifecycle events: agent created, suspended, revoked, credential rotated, token issued. Operators create webhook subscriptions with HMAC-SHA256 signing. Delivery is async via a Redis-backed queue with exponential backoff retry. An optional Kafka/NATS adapter is available for high-throughput environments.
### 6. SOC 2 Type II Preparation
Implement the technical controls required for SOC 2 Type II audit: encryption at rest via PostgreSQL column-level encryption for secrets, TLS enforcement on all inbound connections, automated secrets rotation, security event alerting via Prometheus alerting rules, and audit log immutability proof using a Merkle hash chain appended to each `audit_logs` row.
---
## Out of Scope for Phase 3
- Rust/C++ SDKs (Phase 4)
- Azure Terraform module (Phase 4)
- SCIM provisioning (Phase 4)
- End-user (human operator) identity management (out of product scope — AgentIdP is agent-first)
---
## Capabilities Table
### New Capabilities
| Workstream | Capability | Type |
|-----------|-----------|------|
| Multi-Tenancy | Organization model with isolated agent namespaces | New |
| Multi-Tenancy | Admin API: create, list, update, delete organizations | New |
| Multi-Tenancy | Per-organization rate limits and audit logs | New |
| Multi-Tenancy | Organization member management | New |
| W3C DIDs | `did:web` identifier on every registered agent | New |
| W3C DIDs | DID Document endpoint per agent | New |
| W3C DIDs | Instance-level root DID Document | New |
| W3C DIDs | DID resolution endpoint | New |
| W3C DIDs | AGNTCY-format agent card from DID Document | New |
| OIDC | OIDC Discovery endpoint (`/.well-known/openid-configuration`) | New |
| OIDC | JWKS endpoint (`/.well-known/jwks.json`) | New |
| OIDC | ID token with agent claims in token response | Modified |
| OIDC | `/agent-info` endpoint (agent claims) | New |
| Federation | Trust registry: register and list federation partners | New |
| Federation | Cross-instance token verification endpoint | New |
| Federation | Signed JWT assertion inter-IdP protocol | New |
| Webhooks | Webhook subscription management (CRUD) | New |
| Webhooks | HMAC-SHA256 signed delivery with retry | New |
| Webhooks | Delivery history log | New |
| Webhooks | Kafka/NATS adapter (optional) | New |
| SOC 2 | PostgreSQL column-level encryption for secrets at rest | New |
| SOC 2 | TLS enforcement middleware (reject non-TLS) | New |
| SOC 2 | Automated secrets rotation schedule | New |
| SOC 2 | Security event alerting (Prometheus alerting rules) | New |
| SOC 2 | Merkle hash chain on `audit_logs` for immutability proof | New |
| SOC 2 | Compliance documentation (controls matrix, runbook) | New |
### Modified Capabilities
| Workstream | Capability | Change |
|-----------|-----------|--------|
| Multi-Tenancy | `POST /agents` | Now scoped to `organizationId` |
| Multi-Tenancy | `GET /agents` | Filters restricted to caller's organization |
| Multi-Tenancy | `GET /audit` | Restricted to caller's organization by default |
| Multi-Tenancy | Rate limiting | Per-organization limits in addition to global |
| OIDC | `POST /oauth2/token` | Returns `id_token` in addition to `access_token` |
| SOC 2 | Audit log write path | Computes and appends Merkle hash on insert |
---
## Repository Impact
| Area | Impact |
|------|--------|
| `src/` | New services: OrgService, DIDService, OIDCService, FederationService, WebhookService, SOC2Controls |
| `src/db/migrations/` | 810 new migration files |
| `src/types/index.ts` | ~80 new interfaces/types |
| `src/middleware/` | New TLS enforcement middleware, updated auth middleware for org context |
| `src/routes/` | 6 new route files |
| `/.well-known/` | 3 new well-known endpoints |
| `policies/` | Updated Rego policies for org-scoped permissions |
| `dashboard/` | New Organization management pages |
| `monitoring/` | New alerting rules for SOC 2 security events |
| `docs/` | Compliance documentation, federation setup guide, webhook integration guide |
---
## New Dependencies
| Workstream | Package | Purpose | CEO Approval Required |
|-----------|---------|---------|----------------------|
| Multi-Tenancy | No new packages — row-level tenancy in existing PostgreSQL | — | No |
| W3C DIDs | `did-resolver` | W3C DID resolution | Yes |
| W3C DIDs | `web-did-resolver` | DID:WEB method resolver | Yes |
| OIDC | `oidc-provider` | Certified OIDC server library | Yes |
| Federation | No new packages — signed JWT assertions use existing `jsonwebtoken` | — | No |
| Webhooks | `bull` (Redis-backed queue) | Async webhook delivery queue | Yes |
| Webhooks | `kafkajs` (optional, Kafka adapter) | Kafka event streaming | Yes |
| SOC 2 | `node-forge` | Column-level encryption primitives | Yes |
---
## Delivery Sequence
Multi-tenancy is a prerequisite for all enterprise customer work — it must land first. DID support and OIDC are independent and can proceed in parallel. Federation depends on DIDs being in place. Webhooks are standalone. SOC 2 controls cut across the entire codebase and are implemented last to ensure all features they protect are already present.
```
1. Multi-Tenancy (prerequisite — all enterprise features assume org context)
2. W3C DIDs (parallel)
OIDC (parallel)
3. Federation (depends on DIDs)
4. Webhooks (standalone)
5. SOC 2 (cuts across all workstreams — implemented after all features are stable)
```
---
## Success Criteria
- All new dependencies CEO-approved before implementation begins
- All new API endpoints have OpenAPI 3.0 specs before implementation
- Multi-tenancy isolation verified: no cross-organization data leakage
- DID Documents are W3C DID Core 1.0 compliant and resolve correctly
- OIDC Discovery passes `oidc-provider` conformance test suite
- Federation token verification rejects tampered assertions
- Webhook delivery achieves >99.9% success rate with retry logic
- SOC 2 controls pass independent technical review
- TypeScript strict mode + zero `any` maintained throughout
- >80% test coverage on all new services

View File

@@ -0,0 +1,370 @@
# AGNTCY Federation — Specification
**Workstream**: 4 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
---
## API Endpoints
### POST /federation/trust
Register a new federation trust partner. Requires `admin:orgs` scope.
```yaml
POST /federation/trust
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, issuer, jwksUri]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Human-readable name for this federation partner
example: "Contoso AgentIdP"
issuer:
type: string
format: uri
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
example: "https://agentidp.contoso.com"
jwksUri:
type: string
format: uri
description: URL of the partner's JWKS endpoint
example: "https://agentidp.contoso.com/.well-known/jwks.json"
allowedOrganizations:
type: array
items:
type: string
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
example: ["org_contoso_engineering"]
expiresAt:
type: string
format: date-time
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
Responses:
201 Created:
schema:
$ref: '#/components/schemas/FederationPartner'
example:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
allowedOrganizations: []
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
duplicate_issuer:
code: "DUPLICATE_ISSUER"
message: "A trust relationship with this issuer already exists"
unreachable_jwks:
code: "JWKS_UNREACHABLE"
message: "Could not fetch JWKS from the provided jwksUri"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /federation/partners
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
```yaml
GET /federation/partners
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, expired]
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/FederationPartner'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /federation/partners/:partnerId
Remove a federation trust relationship. Requires `admin:orgs` scope.
```yaml
DELETE /federation/partners/{partnerId}
Authorization: Bearer <token with admin:orgs scope>
Path Parameters:
partnerId:
type: string
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /federation/verify
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
```yaml
POST /federation/verify
Authorization: Bearer <local access_token with agents:read scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [token]
properties:
token:
type: string
description: The JWT token issued by the remote AgentIdP instance to verify
expectedIssuer:
type: string
format: uri
description: Optional — if provided, verification fails if token issuer does not match
expectedOrganizationId:
type: string
description: Optional — if provided, verification fails if token organization_id does not match
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
claims:
type: object
description: Decoded JWT claims from the verified token
properties:
sub:
type: string
iss:
type: string
iat:
type: integer
exp:
type: integer
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
did:
type: string
partner:
type: object
description: The federation partner record that vouches for this token
properties:
partnerId:
type: string
name:
type: string
issuer:
type: string
example:
valid: true
claims:
sub: "agt_contoso_abc123"
iss: "https://agentidp.contoso.com"
iat: 1743249600
exp: 1743253200
agent_id: "agt_contoso_abc123"
agent_type: "classifier"
organization_id: "org_contoso_engineering"
capabilities: ["text-classification"]
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
partner:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized (local token invalid):
schema:
$ref: '#/components/schemas/ErrorResponse'
422 Unprocessable Entity (token invalid or untrusted issuer):
schema:
type: object
properties:
valid:
type: boolean
example: false
reason:
type: string
enum:
- TOKEN_EXPIRED
- INVALID_SIGNATURE
- UNTRUSTED_ISSUER
- JWKS_FETCH_FAILED
- ORGANIZATION_NOT_ALLOWED
message:
type: string
example:
valid: false
reason: "UNTRUSTED_ISSUER"
message: "No trust relationship registered for issuer https://unknown.example.com"
```
---
## Database Schema Changes
### New Table: federation_partners
```sql
CREATE TABLE federation_partners (
partner_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
name VARCHAR(100) NOT NULL,
issuer VARCHAR(255) NOT NULL,
jwks_uri VARCHAR(255) NOT NULL,
allowed_organizations JSONB NOT NULL DEFAULT '[]',
status VARCHAR(20) NOT NULL DEFAULT 'active',
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
last_jwks_fetch TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
UNIQUE (organization_id, issuer)
);
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
```
### Redis: JWKS Cache
Partner JWKS documents are cached in Redis with a TTL:
```
Key: federation:jwks:<issuer_url_sha256>
Value: JSON string of the JWKS document
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
---
## Dependencies
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
---
## Security Considerations
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
---
## Acceptance Criteria
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService

View File

@@ -0,0 +1,444 @@
# Multi-Tenancy — Specification
**Workstream**: 1 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
---
## API Endpoints
### POST /organizations
Create a new organization. Requires system-admin scope (`admin:orgs`).
```yaml
POST /organizations
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, slug]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Display name of the organization
example: "Acme AI Platform"
slug:
type: string
minLength: 2
maxLength: 50
pattern: "^[a-z0-9-]+$"
description: URL-safe unique identifier
example: "acme-ai"
planTier:
type: string
enum: [free, pro, enterprise]
default: free
maxAgents:
type: integer
minimum: 1
default: 100
maxTokensPerMonth:
type: integer
minimum: 1
default: 10000
Responses:
201 Created:
schema:
$ref: '#/components/schemas/Organization'
example:
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
maxAgents: 100
maxTokensPerMonth: 10000
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "VALIDATION_ERROR"
message: "slug must be unique"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "admin:orgs scope required"
```
---
### GET /organizations
List all organizations. Requires `admin:orgs` scope.
```yaml
GET /organizations
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, deleted]
page:
type: integer
minimum: 1
default: 1
limit:
type: integer
minimum: 1
maximum: 100
default: 20
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Organization'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /organizations/:orgId
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
```yaml
GET /organizations/{orgId}
Authorization: Bearer <token>
Path Parameters:
orgId:
type: string
description: Organization ID (org_... prefix)
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_NOT_FOUND"
message: "Organization not found"
```
---
### PATCH /organizations/:orgId
Partially update an organization. Requires `admin:orgs` scope.
```yaml
PATCH /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
name:
type: string
minLength: 2
maxLength: 100
planTier:
type: string
enum: [free, pro, enterprise]
maxAgents:
type: integer
minimum: 1
maxTokensPerMonth:
type: integer
minimum: 1
status:
type: string
enum: [active, suspended]
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /organizations/:orgId
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
```yaml
DELETE /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_HAS_ACTIVE_AGENTS"
message: "Organization has active agents; decommission all agents before deleting"
```
---
### POST /organizations/:orgId/members
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
```yaml
POST /organizations/{orgId}/members
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [agentId, role]
properties:
agentId:
type: string
description: ID of an already-registered agent to add as a member
role:
type: string
enum: [member, admin]
description: Role within the organization
Responses:
201 Created:
schema:
$ref: '#/components/schemas/OrgMember'
example:
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
role: "member"
joinedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ALREADY_MEMBER"
message: "Agent is already a member of this organization"
```
---
### Modified: All /agents, /audit endpoints
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
---
## Database Schema Changes
### New Table: organizations
```sql
CREATE TABLE organizations (
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
name VARCHAR(100) NOT NULL,
slug VARCHAR(50) NOT NULL UNIQUE,
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
max_agents INTEGER NOT NULL DEFAULT 100,
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
status VARCHAR(20) NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
);
CREATE INDEX idx_organizations_slug ON organizations(slug);
CREATE INDEX idx_organizations_status ON organizations(status);
```
### New Table: organization_members
```sql
CREATE TABLE organization_members (
member_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
role VARCHAR(20) NOT NULL DEFAULT 'member',
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
UNIQUE (organization_id, agent_id)
);
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
```
### Modified: agents table
```sql
ALTER TABLE agents
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
-- RLS
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
CREATE POLICY agents_org_isolation ON agents
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: credentials table
```sql
ALTER TABLE credentials
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY credentials_org_isolation ON credentials
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_logs_org_isolation ON audit_logs
USING (organization_id = current_setting('app.organization_id', true));
```
### Seed: Default system organization
```sql
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
---
## Dependencies
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
---
## Security Considerations
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
- `SET LOCAL app.organization_id` is called at the start of every database transaction
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
---
## Acceptance Criteria
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
- [ ] Pre-existing agents assigned to default system organization without data loss
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService

View File

@@ -0,0 +1,366 @@
# OpenID Connect (OIDC) — Specification
**Workstream**: 3 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
---
## API Endpoints
### GET /.well-known/openid-configuration
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
```yaml
GET /.well-known/openid-configuration
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: OIDC Discovery document per OpenID Connect Discovery 1.0
example:
issuer: "https://idp.sentryagent.ai"
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
response_types_supported:
- "token"
grant_types_supported:
- "client_credentials"
subject_types_supported:
- "public"
id_token_signing_alg_values_supported:
- "RS256"
- "ES256"
scopes_supported:
- "openid"
- "agents:read"
- "agents:write"
- "tokens:read"
- "audit:read"
claims_supported:
- "sub"
- "iss"
- "iat"
- "exp"
- "agent_id"
- "agent_type"
- "organization_id"
- "capabilities"
- "deployment_env"
- "owner"
token_endpoint_auth_methods_supported:
- "client_secret_post"
- "client_secret_basic"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /.well-known/jwks.json
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
```yaml
GET /.well-known/jwks.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
Cache-Control: public, max-age=3600
schema:
type: object
required: [keys]
properties:
keys:
type: array
items:
type: object
description: JSON Web Key (RFC 7517)
properties:
kty:
type: string
example: "RSA"
use:
type: string
example: "sig"
kid:
type: string
description: Key ID — matches `kid` header in issued JWTs
alg:
type: string
example: "RS256"
n:
type: string
description: RSA modulus (base64url)
e:
type: string
description: RSA exponent (base64url)
example:
keys:
- kty: "RSA"
use: "sig"
kid: "key-2026-03-29-01"
alg: "RS256"
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
e: "AQAB"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /oauth2/token (extended)
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
```yaml
POST /oauth2/token
Content-Type: application/x-www-form-urlencoded
Request Body:
schema:
type: object
required: [grant_type, client_id, client_secret]
properties:
grant_type:
type: string
enum: [client_credentials]
client_id:
type: string
client_secret:
type: string
scope:
type: string
description: Space-separated scopes. Include "openid" to receive an id_token.
example: "openid agents:read"
Responses:
200 OK (with openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
example: "Bearer"
expires_in:
type: integer
scope:
type: string
id_token:
type: string
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "openid agents:read"
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
200 OK (without openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
expires_in:
type: integer
scope:
type: string
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "agents:read"
400 Bad Request:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
example:
error: "invalid_client"
error_description: "Invalid client credentials"
401 Unauthorized:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
```
#### ID Token Claims
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
```json
{
"iss": "https://idp.sentryagent.ai",
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
"iat": 1743249600,
"exp": 1743253200,
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
"agent_type": "orchestrator",
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
"capabilities": ["task-planning", "tool-use"],
"deployment_env": "production",
"owner": "acme-ai",
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
}
```
---
### GET /agent-info
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
```yaml
GET /agent-info
Authorization: Bearer <access_token>
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: Agent identity claims (subset of registered agent data)
properties:
sub:
type: string
description: Subject — agentId
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
deployment_env:
type: string
owner:
type: string
version:
type: string
status:
type: string
did:
type: string
description: W3C DID for this agent (if DID workstream is active)
created_at:
type: string
format: date-time
example:
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_type: "orchestrator"
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
capabilities: ["task-planning", "tool-use"]
deployment_env: "production"
owner: "acme-ai"
version: "1.2.0"
status: "active"
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
created_at: "2026-03-29T12:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "Invalid or expired access token"
```
---
## Database Schema Changes
### New Table: oidc_keys
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
```sql
CREATE TABLE oidc_keys (
key_id VARCHAR(40) PRIMARY KEY,
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
algorithm VARCHAR(10) NOT NULL,
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
retired_at TIMESTAMPTZ,
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
);
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
---
## Security Considerations
- ID token signing keys are stored in Vault; public keys only are served via JWKS
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
- ID tokens for agent credentials must not contain client secrets or internal system paths
- `alg: none` is explicitly rejected — all ID tokens must be signed
---
## Acceptance Criteria
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
- [ ] `/agent-info` returns correct claims for authenticated agent
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService

View File

@@ -0,0 +1,335 @@
# SOC 2 Type II Preparation — Specification
**Workstream**: 6 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
---
## Technical Controls
### Control C1: Encryption at Rest (Column-Level Encryption)
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
**Columns encrypted**:
- `credentials.secret_hash` — encrypted with AES-256-CBC
- `credentials.vault_path` — encrypted with AES-256-CBC
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
---
### Control C2: TLS Enforcement
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
---
### Control C3: Automated Secrets Rotation
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
3. Renews Vault leases for all active credentials
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
---
### Control C4: Audit Log Immutability (Merkle Hash Chain)
Every `audit_logs` row carries two new columns:
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
---
### Control C5: Security Event Alerting
Prometheus alerting rules are written for the following security events:
| Alert | Condition | Severity |
|-------|-----------|---------|
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
---
## API Endpoints
### GET /audit/verify
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
```yaml
GET /audit/verify
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
fromDate:
type: string
format: date-time
description: Start of verification range. If omitted, verifies from genesis.
toDate:
type: string
format: date-time
description: End of verification range. If omitted, verifies to the latest row.
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
description: True if the chain is intact across the entire range
rowsVerified:
type: integer
description: Number of audit rows verified
firstEventId:
type: string
lastEventId:
type: string
firstTimestamp:
type: string
format: date-time
lastTimestamp:
type: string
format: date-time
verifiedAt:
type: string
format: date-time
brokenAtEventId:
type: string
nullable: true
description: Present only if valid=false — the first eventId where the chain breaks
example:
valid: true
rowsVerified: 15420
firstEventId: "evt_genesis_00001"
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
firstTimestamp: "2026-01-01T00:00:00Z"
lastTimestamp: "2026-03-29T12:00:00Z"
verifiedAt: "2026-03-29T14:00:00Z"
brokenAtEventId: null
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
429 Too Many Requests:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMITED"
message: "Audit verification can be run at most once per 5 minutes"
```
---
### GET /compliance/controls
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
```yaml
GET /compliance/controls
Authorization: Bearer <token with admin:orgs scope>
Responses:
200 OK:
schema:
type: object
properties:
generatedAt:
type: string
format: date-time
controls:
type: array
items:
type: object
properties:
controlId:
type: string
name:
type: string
status:
type: string
enum: [pass, fail, warning, not_applicable]
description:
type: string
lastChecked:
type: string
format: date-time
example:
generatedAt: "2026-03-29T14:00:00Z"
controls:
- controlId: "C1"
name: "Encryption at Rest"
status: "pass"
description: "Column-level encryption active for all sensitive columns"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C2"
name: "TLS Enforcement"
status: "pass"
description: "All non-TLS requests redirected to HTTPS in production"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C3"
name: "Secrets Rotation"
status: "warning"
description: "3 credentials expiring within 7 days"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C4"
name: "Audit Log Immutability"
status: "pass"
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C5"
name: "Security Event Alerting"
status: "pass"
description: "All 6 alerting rules active in Prometheus"
lastChecked: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
-- Back-fill genesis hash for existing rows (one-time migration)
-- Migration script computes chain in order of created_at
-- Prevent updates and deletes (immutability trigger)
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_logs_immutability
BEFORE UPDATE OR DELETE ON audit_logs
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
```
### Modified: credentials table
```sql
-- Columns remain same type; application now stores encrypted values
-- No DDL change — encryption is transparent at application layer
-- Add comment for documentation
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
```
### New Table: compliance_check_log
```sql
CREATE TABLE compliance_check_log (
check_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
control_id VARCHAR(10) NOT NULL,
status VARCHAR(20) NOT NULL,
details JSONB NOT NULL DEFAULT '{}',
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
---
## Compliance Documentation
The following documents are produced as part of this workstream:
| Document | Path | Description |
|----------|------|-------------|
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
---
## Security Considerations
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
- `GET /compliance/controls` never returns raw secrets or key material — only control status
---
## Acceptance Criteria
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
- [ ] Compliance documentation written and reviewed
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations

View File

@@ -0,0 +1,353 @@
# W3C Decentralized Identifiers (DIDs) — Specification
**Workstream**: 2 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
---
## API Endpoints
### GET /.well-known/did.json
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
```yaml
GET /.well-known/did.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant DID Document
required: [id, "@context", verificationMethod, authentication]
properties:
"@context":
type: array
items:
type: string
example:
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/security/suites/jws-2020/v1"
id:
type: string
description: DID for this AgentIdP instance
example: "did:web:idp.sentryagent.ai"
controller:
type: string
example: "did:web:idp.sentryagent.ai"
verificationMethod:
type: array
items:
$ref: '#/components/schemas/VerificationMethod'
authentication:
type: array
items:
type: string
description: References to verification methods for authentication
assertionMethod:
type: array
items:
type: string
service:
type: array
items:
$ref: '#/components/schemas/DIDService'
example:
"@context":
- "https://www.w3.org/ns/did/v1"
id: "did:web:idp.sentryagent.ai"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
authentication:
- "did:web:idp.sentryagent.ai#key-1"
service:
- id: "did:web:idp.sentryagent.ai#agent-registry"
type: "AgentIdentityProvider"
serviceEndpoint: "https://idp.sentryagent.ai"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did
Per-agent DID Document. No authentication required — DID Documents are public.
```yaml
GET /agents/{agentId}/did
No authentication required
Path Parameters:
agentId:
type: string
description: Agent ID
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant per-agent DID Document
example:
"@context":
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/agntcy/v1"
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "abc123"
y: "def456"
authentication:
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
service:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
type: "AgentCard"
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
agntcy:
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
agentType: "orchestrator"
capabilities:
- "task-planning"
- "tool-use"
deploymentEnv: "production"
owner: "acme-ai"
version: "1.2.0"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_NOT_FOUND"
message: "Agent not found"
410 Gone:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_DECOMMISSIONED"
message: "Agent has been decommissioned — DID Document is no longer active"
```
---
### GET /agents/:id/did/resolve
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
```yaml
GET /agents/{agentId}/did/resolve
Authorization: Bearer <token with agents:read scope>
Path Parameters:
agentId:
type: string
Responses:
200 OK:
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
schema:
type: object
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
properties:
didDocument:
type: object
description: The resolved DID Document
didDocumentMetadata:
type: object
properties:
created:
type: string
format: date-time
updated:
type: string
format: date-time
deactivated:
type: boolean
didResolutionMetadata:
type: object
properties:
contentType:
type: string
example: "application/did+ld+json"
retrieved:
type: string
format: date-time
example:
didDocument:
"@context": ["https://www.w3.org/ns/did/v1"]
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
didDocumentMetadata:
created: "2026-03-29T12:00:00Z"
updated: "2026-03-29T12:00:00Z"
deactivated: false
didResolutionMetadata:
contentType: "application/did+ld+json"
retrieved: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did/card
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
```yaml
GET /agents/{agentId}/did/card
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: AGNTCY-format agent card
properties:
did:
type: string
name:
type: string
agentType:
type: string
capabilities:
type: array
items:
type: string
owner:
type: string
version:
type: string
deploymentEnv:
type: string
identityProvider:
type: string
description: DID of the issuing AgentIdP instance
issuedAt:
type: string
format: date-time
example:
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
name: "acme-orchestrator"
agentType: "orchestrator"
capabilities: ["task-planning", "tool-use"]
owner: "acme-ai"
version: "1.2.0"
deploymentEnv: "production"
identityProvider: "did:web:idp.sentryagent.ai"
issuedAt: "2026-03-29T12:00:00Z"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### New Table: agent_did_keys
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
```sql
CREATE TABLE agent_did_keys (
key_id VARCHAR(40) PRIMARY KEY,
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
rotated_at TIMESTAMPTZ,
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
);
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
```
### New Column: agents.did
```sql
ALTER TABLE agents
ADD COLUMN did VARCHAR(255),
ADD COLUMN did_created_at TIMESTAMPTZ;
-- Populated automatically on agent creation
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
---
## Security Considerations
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
---
## Acceptance Criteria
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
- [ ] DID resolution endpoint returns W3C DID Resolution format
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
- [ ] Private keys never appear in any API response or log
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService

View File

@@ -0,0 +1,476 @@
# Webhooks and Event Streaming — Specification
**Workstream**: 5 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
---
## API Endpoints
### POST /webhooks
Create a new webhook subscription. Requires `agents:write` scope.
```yaml
POST /webhooks
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [url, events, secret]
properties:
url:
type: string
format: uri
description: HTTPS endpoint to deliver events to
example: "https://app.example.com/hooks/agentidp"
events:
type: array
items:
type: string
enum:
- agent.created
- agent.updated
- agent.suspended
- agent.reactivated
- agent.decommissioned
- credential.generated
- credential.rotated
- credential.revoked
- token.issued
- token.revoked
- "*"
minItems: 1
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
example: ["agent.created", "credential.rotated"]
secret:
type: string
minLength: 16
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
example: "whsec_super_secret_value_here"
description:
type: string
maxLength: 255
description: Optional human-readable description for this subscription
active:
type: boolean
default: true
Responses:
201 Created:
schema:
$ref: '#/components/schemas/WebhookSubscription'
example:
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
url: "https://app.example.com/hooks/agentidp"
events: ["agent.created", "credential.rotated"]
description: "Production event sink"
active: true
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalid_url:
code: "VALIDATION_ERROR"
message: "url must be a valid HTTPS URI"
invalid_event:
code: "VALIDATION_ERROR"
message: "Unknown event type: agent.unknown"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
```yaml
GET /webhooks
Authorization: Bearer <token with agents:read scope>
Query Parameters:
active:
type: boolean
description: Filter by active/inactive subscriptions
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookSubscription'
total:
type: integer
page:
type: integer
limit:
type: integer
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id
Get a single webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:read scope>
Path Parameters:
subscriptionId:
type: string
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "WEBHOOK_NOT_FOUND"
message: "Webhook subscription not found"
```
---
### PATCH /webhooks/:id
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
```yaml
PATCH /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
url:
type: string
format: uri
events:
type: array
items:
type: string
description:
type: string
maxLength: 255
active:
type: boolean
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /webhooks/:id
Delete a webhook subscription. Requires `agents:write` scope.
```yaml
DELETE /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id/deliveries
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}/deliveries
Authorization: Bearer <token with agents:read scope>
Query Parameters:
status:
type: string
enum: [pending, success, failed, dead_letter]
eventType:
type: string
description: Filter by event type
fromDate:
type: string
format: date-time
toDate:
type: string
format: date-time
page:
type: integer
default: 1
limit:
type: integer
default: 50
maximum: 200
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookDelivery'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
eventType: "agent.created"
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
status: "success"
httpStatusCode: 200
attemptCount: 1
nextRetryAt: null
deliveredAt: "2026-03-29T12:00:05Z"
createdAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 50
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Webhook Payload Format
Every webhook delivery uses this envelope format:
```json
{
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
"type": "agent.created",
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
"timestamp": "2026-03-29T12:00:00Z",
"data": {
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
"agentType": "orchestrator",
"status": "active",
"owner": "acme-ai",
"version": "1.0.0",
"deploymentEnv": "production"
}
}
```
### HMAC-SHA256 Signature
Every delivery includes the following HTTP headers:
```
X-AgentIdP-Event: agent.created
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
X-AgentIdP-Timestamp: 1743249600
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
```
Signature computation:
```
signed_content = timestamp + "." + JSON.stringify(payload)
signature = HMAC-SHA256(secret, signed_content)
header_value = "sha256=" + hex(signature)
```
---
## Database Schema Changes
### New Table: webhook_subscriptions
```sql
CREATE TABLE webhook_subscriptions (
subscription_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
url VARCHAR(2048) NOT NULL,
events JSONB NOT NULL DEFAULT '[]',
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
vault_secret_path VARCHAR(255) NOT NULL,
description VARCHAR(255),
active BOOLEAN NOT NULL DEFAULT TRUE,
failure_count INTEGER NOT NULL DEFAULT 0,
last_delivery_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
```
### New Table: webhook_deliveries
```sql
CREATE TABLE webhook_deliveries (
delivery_id VARCHAR(40) PRIMARY KEY,
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
event_id VARCHAR(40) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
http_status_code SMALLINT,
response_body TEXT,
attempt_count SMALLINT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
);
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
```
---
## Retry Schedule
```
Attempt 1: immediate
Attempt 2: 1 minute after failure
Attempt 3: 5 minutes after failure
Attempt 4: 15 minutes after failure
Attempt 5: 1 hour after failure
Attempt 6: 4 hours after failure
Attempt 7: 12 hours after failure
Attempt 8: 24 hours after failure
Attempt 9: 48 hours after failure
Attempt 10: 72 hours after failure
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
---
## Security Considerations
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
- HMAC signature allows the receiving server to verify the delivery is authentic
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
---
## Acceptance Criteria
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
- [ ] After max retries, status is `dead_letter` and metric is incremented
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
- [ ] Delivery to private IP range is rejected (SSRF protection)
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService

View File

@@ -0,0 +1,142 @@
# Phase 3: Enterprise — Tasks
**Status**: COMPLETE — All 6 workstreams done ✅
## CEO Approval Gates (required before implementation)
- [x] A0.1 Approve dependency: `did-resolver` + `web-did-resolver` (W3C DID support)
- [x] A0.2 Approve dependency: `oidc-provider` (certified OIDC server library)
- [x] A0.3 Approve dependency: `bull` (Redis-backed webhook delivery queue)
- [x] A0.4 Approve dependency: `kafkajs` (optional Kafka adapter for webhooks)
- [x] A0.5 Approve dependency: `node-forge` (column-level encryption for SOC 2)
---
## Workstream 1: Multi-Tenancy
- [x] 1.1 Write `src/db/migrations/006_create_organizations_table.sql` — organizations table with slug, plan_tier, max_agents, max_tokens_per_month, status
- [x] 1.2 Write `src/db/migrations/007_create_organization_members_table.sql` — organization_members with agent_id FK and role
- [x] 1.3 Write `src/db/migrations/008_add_organization_id_to_agents.sql` — add organization_id column + index + RLS policy on agents
- [x] 1.4 Write `src/db/migrations/009_add_organization_id_to_credentials.sql` — add organization_id column + index + RLS policy on credentials
- [x] 1.5 Write `src/db/migrations/010_add_organization_id_to_audit_logs.sql` — add organization_id column + index + RLS policy on audit_logs
- [x] 1.6 Write `src/db/migrations/011_seed_system_organization.sql` — insert default system org and backfill existing rows
- [x] 1.7 Write `src/types/organization.ts` — IOrganization, ICreateOrgRequest, IUpdateOrgRequest, IOrgMember, IPaginatedOrgsResponse, OrgStatus, PlanTier interfaces
- [x] 1.8 Write `src/services/OrgService.ts` — createOrg, listOrgs, getOrg, updateOrg, deleteOrg, addMember; all methods accept organizationId context
- [x] 1.9 Write `src/controllers/OrgController.ts` — request parsing and validation for all 6 org endpoints
- [x] 1.10 Write `src/routes/organizations.ts` — mount all 6 org endpoints with admin:orgs scope guard
- [x] 1.11 Write `src/middleware/orgContext.ts` — OrgContextMiddleware: extracts organization_id from JWT and calls SET app.organization_id before each DB query
- [x] 1.12 Update `src/middleware/auth.ts` — extend ITokenPayload with organization_id claim; backfill from DEFAULT_ORG_ID for backward compat
- [x] 1.13 Update `src/services/AgentService.ts` — organizationId propagated via RLS session variable (orgContext middleware)
- [x] 1.14 Update `src/services/CredentialService.ts` — organizationId propagated via RLS session variable
- [x] 1.15 Update `src/services/AuditService.ts` — organizationId propagated via RLS session variable
- [x] 1.16 Update `src/services/OAuth2Service.ts` — include organization_id claim in issued JWT payload
- [x] 1.17 Update `src/types/index.ts` — extend ITokenPayload with organization_id field, admin:orgs scope, org audit actions
- [x] 1.18 Update OPA policy `policies/authz.rego` + `policies/data/scopes.json` — 6 new org endpoint → admin:orgs mappings
- [x] 1.19 Write unit tests for OrgService (CRUD, member management, org isolation)
- [x] 1.20 Write integration tests — all 6 /organizations endpoints, cross-org isolation via RLS
- [x] 1.21 QA sign-off: 373 tests passing, 80.64% branch coverage, zero `any`, TypeScript clean
---
## Workstream 2: W3C DIDs
- [x] 2.1 Write `src/db/migrations/012_create_agent_did_keys_table.sql` — agent_did_keys table with public_key_jwk JSONB and vault_key_path
- [x] 2.2 Write `src/db/migrations/013_add_did_columns_to_agents.sql` — add did and did_created_at columns to agents
- [x] 2.3 Write `src/types/did.ts` — IDIDDocument, IVerificationMethod, IDIDService, IDIDResolutionResult, IAgentCard interfaces
- [x] 2.4 Write `src/services/DIDService.ts` — generateDIDForAgent (EC P-256 key pair, Vault storage, public key in DB), buildInstanceDIDDocument, buildAgentDIDDocument, buildAgentCard, buildResolutionResult
- [x] 2.5 Update `src/services/AgentService.ts` — call DIDService.generateDIDForAgent on every new agent registration
- [x] 2.6 Write `src/controllers/DIDController.ts` — handlers for root DID Document, per-agent DID Document (410 for decommissioned), resolution endpoint, agent card
- [x] 2.7 Write `src/routes/did.ts` — createDIDRouter for `/agents/:id/did`, `/did/resolve`, `/did/card`; `/.well-known/did.json` registered in app.ts
- [x] 2.8 Implement Redis caching in DIDService — cache DID Documents with TTL from DID_DOCUMENT_CACHE_TTL_SECONDS (default 300s)
- [x] 2.9 Handle decommissioned agents — deactivated: true in metadata; HTTP 410 Gone from DIDController
- [x] 2.10 Write unit tests for DIDService — 39 tests, 98.93% coverage; private key security asserted
- [x] 2.11 Write integration tests — all 4 DID endpoints; 22 tests
- [x] 2.12 QA sign-off: 429 tests passing, 98.93% DIDService coverage, private key never in response, zero `any`
---
## Workstream 3: OpenID Connect (OIDC)
- [x] 3.1 Write `src/db/migrations/014_create_oidc_keys_table.sql` — oidc_keys table with kid, public_key_jwk, vault_key_path, is_current
- [x] 3.2 Write `src/services/OIDCKeyService.ts` — generateSigningKeyPair (RSA-2048 or EC P-256), storeKeyInVault, getPublicJWKS, getCurrentKeyId, rotateKey
- [x] 3.3 Write `src/services/IDTokenService.ts` — buildIDTokenClaims (agent claims), signIDToken using current Vault-stored key, verifyIDToken
- [x] 3.4 Write `src/types/oidc.ts` — IIDTokenClaims, IJWKSResponse, IOIDCDiscoveryDocument, IAgentInfoResponse interfaces
- [x] 3.5 Write `src/controllers/OIDCController.ts` — handlers for discovery, JWKS, agent-info
- [x] 3.6 Write `src/routes/oidc.ts` — mount `/.well-known/openid-configuration`, `/.well-known/jwks.json`, `/agent-info`
- [x] 3.7 Update `src/services/OAuth2Service.ts` — when `openid` scope is present in request, generate and append `id_token` to token response
- [x] 3.8 Implement JWKS caching — cache JWKS in Redis with TTL; invalidate on key rotation
- [x] 3.9 Implement key rotation logic — on rotation, old key remains in JWKS until all tokens signed with it have expired
- [x] 3.10 Write unit tests for OIDCKeyService and IDTokenService — key generation, token signing, JWKS format
- [x] 3.11 Write integration tests — POST /oauth2/token with `openid` scope returns id_token; validate id_token against JWKS; GET /agent-info returns correct claims
- [x] 3.12 QA sign-off: OIDC discovery document passes conformance checks, id_token verifiable, `alg: none` rejected, zero `any`, >80% coverage
---
## Workstream 4: AGNTCY Federation
- [x] 4.1 Write `src/db/migrations/015_create_federation_partners_table.sql` — federation_partners table with issuer, jwks_uri, allowed_organizations JSONB, status, expires_at
- [x] 4.2 Write `src/types/federation.ts` — IFederationPartner, ICreatePartnerRequest, IVerifyFederatedTokenRequest, IFederationVerifyResult interfaces
- [x] 4.3 Write `src/services/FederationService.ts` — registerPartner (validates by fetching JWKS), listPartners, deletePartner, verifyFederatedToken (fetch-or-cache JWKS, verify signature, validate claims)
- [x] 4.4 Implement JWKS caching in FederationService — store partner JWKS in Redis with TTL configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS
- [x] 4.5 Write `src/controllers/FederationController.ts` — handlers for POST /federation/trust, GET /federation/partners, DELETE /federation/partners/:id, POST /federation/verify
- [x] 4.6 Write `src/routes/federation.ts` — mount all 4 federation endpoints
- [x] 4.7 Implement partner expiry check — partners past `expires_at` are treated as status `expired`; their tokens rejected
- [x] 4.8 Implement `allowedOrganizations` filter — reject tokens whose `organization_id` is not in the allow list (if list is non-empty)
- [x] 4.9 Write unit tests for FederationService — trust registration, token verification (valid/expired/untrusted/tampered), JWKS cache behavior
- [x] 4.10 Write integration tests — end-to-end: register partner, verify a valid token from that partner, verify rejection for unknown issuer
- [x] 4.11 QA sign-off: tampered token rejected, expired partner rejected, JWKS cache verified, zero `any`, >80% coverage
---
## Workstream 5: Webhooks and Event Streaming
- [x] 5.1 Write `src/db/migrations/016_create_webhook_subscriptions_table.sql` — webhook_subscriptions with url, events JSONB, secret_hash, vault_secret_path, active, failure_count
- [x] 5.2 Write `src/db/migrations/017_create_webhook_deliveries_table.sql` — webhook_deliveries with status, http_status_code, attempt_count, next_retry_at
- [x] 5.3 Write `src/types/webhook.ts` — IWebhookSubscription, ICreateWebhookRequest, IWebhookDelivery, IWebhookPayload, WebhookEventType interfaces
- [x] 5.4 Write `src/services/WebhookService.ts` — createSubscription (store secret in Vault), listSubscriptions, getSubscription, updateSubscription, deleteSubscription, listDeliveries
- [x] 5.5 Write `src/workers/WebhookDeliveryWorker.ts` — bull queue worker: fetch subscription, compute HMAC-SHA256 signature, POST to URL with headers, update delivery status, schedule retry on failure
- [x] 5.6 Write `src/services/EventPublisher.ts` — buildEventPayload, publishEvent (enqueues to bull queue; also produces to Kafka if KAFKA_BROKERS is set)
- [x] 5.7 Update `src/services/AgentService.ts` — call EventPublisher.publishEvent for: agent.created, agent.updated, agent.suspended, agent.reactivated, agent.decommissioned
- [x] 5.8 Update `src/services/CredentialService.ts` — call EventPublisher.publishEvent for: credential.generated, credential.rotated, credential.revoked
- [x] 5.9 Update `src/services/OAuth2Service.ts` — call EventPublisher.publishEvent for: token.issued, token.revoked
- [x] 5.10 Write `src/controllers/WebhookController.ts` — handlers for all 6 webhook endpoints
- [x] 5.11 Write `src/routes/webhooks.ts` — mount all 6 webhook endpoints with correct scope guards
- [x] 5.12 Implement SSRF protection in WebhookDeliveryWorker — reject delivery to RFC 1918 addresses, loopback, and link-local ranges
- [x] 5.13 Implement dead-letter handling — after max retries, set status to dead_letter and increment `agentidp_webhook_dead_letters_total` Prometheus metric
- [x] 5.14 Write `src/adapters/KafkaAdapter.ts` — optional Kafka producer; activated only when KAFKA_BROKERS env var is set
- [x] 5.15 Write unit tests for WebhookService, WebhookDeliveryWorker, EventPublisher — HMAC computation, retry schedule, dead-letter logic
- [x] 5.16 Write integration tests — create subscription, trigger an event, verify delivery; verify SSRF rejection; verify retry on 5xx response
- [x] 5.17 QA sign-off: HMAC verifiable, SSRF protection active, retry schedule correct, dead-letter metric fires, zero `any`, >80% coverage
---
## Workstream 6: SOC 2 Type II Preparation
- [x] 6.1 Enable `pgcrypto` PostgreSQL extension in `src/db/migrations/018_enable_pgcrypto.sql`
- [x] 6.2 Write `src/services/EncryptionService.ts` — AES-256-CBC encrypt/decrypt using key from Vault; methods: encryptColumn, decryptColumn, isEncrypted
- [x] 6.3 Write `src/db/migrations/019_encrypt_sensitive_columns.sql` — re-encrypt existing credentials.secret_hash and credentials.vault_path values using EncryptionService (migration script)
- [x] 6.4 Update `src/services/CredentialService.ts` — all reads/writes of secret_hash and vault_path go through EncryptionService
- [x] 6.5 Update `src/services/WebhookService.ts` — vault_secret_path column encrypted via EncryptionService
- [x] 6.6 Update `src/services/DIDService.ts` — vault_key_path in agent_did_keys encrypted via EncryptionService
- [x] 6.7 Write `src/middleware/TLSEnforcementMiddleware.ts` — redirect HTTP to HTTPS in production using X-Forwarded-Proto header; passthrough in development
- [x] 6.8 Register TLSEnforcementMiddleware in `src/app.ts` — first in middleware stack
- [x] 6.9 Write `src/db/migrations/020_add_audit_chain_columns.sql` — add hash and previous_hash columns to audit_logs; add immutability trigger; backfill chain for existing rows
- [x] 6.10 Update `src/services/AuditService.ts` — compute Merkle hash on every insert: hash = SHA-256(eventId + timestamp + action + outcome + agentId + organizationId + previousHash)
- [x] 6.11 Write `src/services/AuditVerificationService.ts` — verifyChain(fromDate?, toDate?): reads rows in order, recomputes hashes, returns IChainVerificationResult
- [x] 6.12 Write `src/jobs/SecretsRotationJob.ts` — cron job: identify expiring credentials, emit `agentidp_credentials_expiring_soon_total` metric, renew Vault leases
- [x] 6.13 Write `src/jobs/AuditChainVerificationJob.ts` — cron job: runs verifyChain on a schedule, sets `agentidp_audit_chain_integrity` Prometheus gauge to 1 (pass) or 0 (fail)
- [x] 6.14 Write `src/controllers/ComplianceController.ts` — handlers for GET /audit/verify and GET /compliance/controls
- [x] 6.15 Write `src/routes/compliance.ts` — mount /audit/verify (rate-limited) and /compliance/controls
- [x] 6.16 Write `monitoring/prometheus/alerts.yml` — all 6 alerting rules: AuthFailureSpike, RateLimitExhaustion, AnomalousTokenIssuance, WebhookDeadLetterAccumulating, AuditChainIntegrityFailed, CredentialExpiryApproaching
- [x] 6.17 Update `monitoring/prometheus/prometheus.yml` — add alerting rules file reference
- [x] 6.18 Write compliance documentation package: `docs/compliance/soc2-controls-matrix.md` (Trust Services Criteria → controls map), `docs/compliance/encryption-runbook.md` (key rotation procedure), `docs/compliance/audit-log-runbook.md` (chain verification guide)
- [x] 6.19 Write operational runbooks: `docs/compliance/incident-response.md` (security event procedures), `docs/compliance/secrets-rotation.md` (credential and signing key rotation guide)
- [x] 6.20 Write unit tests for EncryptionService (encrypt/decrypt round-trip, Vault key fetch) and AuditVerificationService (intact chain, tampered chain with correct brokenAtEventId)
- [x] 6.21 Write integration tests — TLS enforcement verified, encrypted columns not plaintext-readable in direct DB query, chain verification returns correct results
- [x] 6.22 QA sign-off: all 5 controls pass GET /compliance/controls, all 6 Prometheus alerts valid, zero `any`, >80% coverage
---
## Phase 3 Complete Criteria
All 6 workstreams done. All tasks checked. All QA gates passed. CEO reviewed. SOC 2 audit window begins.

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-04-02

View File

@@ -0,0 +1,92 @@
## Context
SentryAgent.ai has completed three phases of development: Phase 1 (MVP — core agent registry, OAuth 2.0, audit log), Phase 2 (Production-Ready — Vault, 4 SDKs, OPA, React dashboard, Prometheus, Terraform), and Phase 3 (Enterprise — multi-tenancy, W3C DIDs, OIDC, AGNTCY federation, webhooks, SOC 2). The product is technically complete and enterprise-grade.
Phase 4's constraint is that the codebase is a single Express + TypeScript monorepo (`src/`) with a co-located React dashboard (`dashboard/`). The new developer portal and CLI are independent packages that must not couple into the existing API codebase beyond HTTP calls to the public API.
Known technical debt to resolve before launch: the `GET /audit/verify` rate limiter is process-local (`express-rate-limit` in-memory store), which breaks under horizontal scaling. This must be fixed before public launch.
## Goals / Non-Goals
**Goals:**
- Eliminate the in-memory rate limiter gap — all rate limiting is Redis-backed and horizontally safe
- Give developers a public portal to discover, learn, and onboard onto SentryAgent.ai
- Ship a CLI that lets developers manage agents from their terminal without writing code
- Create a public agent marketplace powered by existing agent registry + DID infrastructure
- Enable CI/CD-native agent identity via GitHub Actions OIDC federation
- Lay the monetization foundation — usage metering, Stripe billing, free/paid tier enforcement
**Non-Goals:**
- Multi-cloud or self-hosted billing (Stripe only)
- Full SaaS admin panel (beyond existing React dashboard additions)
- Mobile apps
- WebSocket-based real-time CLI tail (polling is acceptable for MVP)
- Marketplace payments or agent listings with pricing (discovery only, no transactions)
## Decisions
### ADR-1: ioredis replaces express-rate-limit in-memory store
**Decision:** Switch from `express-rate-limit` (default memory store) to a Redis-backed sliding window using `ioredis` + `rate-limiter-flexible`.
**Rationale:** The in-memory store is process-local — horizontal scaling (multiple Express instances behind a load balancer) produces independent rate limit windows per process, making limits meaningless. `ioredis` is already the preferred Redis client (faster, promises-native, cluster-aware). `rate-limiter-flexible` is battle-tested and supports sliding window, fixed window, and token bucket algorithms in Redis.
**Alternatives considered:** `redis` (official client) — less ergonomic, no cluster support out of box. `express-rate-limit` with `rate-limit-redis` store — additional dependency on top of ioredis, less control.
### ADR-2: Developer portal is a separate Next.js 14 app in `portal/`
**Decision:** The developer portal lives at `portal/` — a standalone Next.js 14 application — not inside the existing `dashboard/` React app.
**Rationale:** The portal is a public-facing marketing/onboarding site (unauthenticated), not an internal management dashboard (authenticated). Mixing public and authenticated surfaces in one bundle increases attack surface and deployment complexity. `portal/` can be deployed independently (Vercel, Cloudflare Pages) while the dashboard remains behind the API.
**Alternatives considered:** Single React app with public/private routing — increases bundle size and complicates auth guards. Embedding portal in existing Express static serving — prevents CDN-edge deployment.
### ADR-3: CLI is a standalone npm package in `cli/`
**Decision:** The `sentryagent` CLI lives at `cli/` with its own `package.json` and is published separately to npm as `sentryagent`.
**Rationale:** CLI users install globally (`npm i -g sentryagent`). Bundling into the API monorepo would force users to install all API dependencies. Separate package = minimal install surface + independent versioning + dedicated README on npm.
**Alternatives considered:** Monorepo workspace — possible but adds tooling complexity for a single-package CLI.
### ADR-4: Agent Marketplace is implemented as new routes in the existing Express API
**Decision:** Marketplace endpoints (`GET /marketplace/agents`, `GET /marketplace/agents/:id`) are added to the existing Express API, not a separate service.
**Rationale:** Marketplace data is derived from the existing `agents` table + DID infrastructure — it is a read-only projection of existing data with public access controls. No new persistence layer needed. Adding routes to Express is the simplest, lowest-risk approach.
**Alternatives considered:** Separate microservice — unnecessary complexity for read-only projections of existing data.
### ADR-5: GitHub Actions use OIDC token exchange (not stored secrets)
**Decision:** `sentryagent/register-agent` and `sentryagent/issue-token` Actions use GitHub's OIDC provider to exchange a GitHub-issued JWT for a SentryAgent.ai agent token — no API keys stored in GitHub Secrets.
**Rationale:** Storing long-lived API keys in GitHub Secrets creates a credential leak risk (secrets can be logged, forked into other repos, etc.). OIDC token exchange is keyless — credentials are ephemeral and scoped to the workflow run. The existing OIDC Provider (Phase 3 WS3) already supports external OIDC federation.
**Alternatives considered:** API key in GitHub Secrets — simpler but credential leak risk. GitHub App installation tokens — more complex, not needed when OIDC already exists.
### ADR-6: Billing uses Stripe with webhook-driven state synchronization
**Decision:** Stripe Checkout + Stripe Webhooks drive subscription state. SentryAgent.ai does not poll Stripe — it receives webhook events (`customer.subscription.created`, `invoice.payment_succeeded`, `customer.subscription.deleted`) to update a `tenant_subscriptions` table.
**Rationale:** Polling Stripe for subscription status introduces latency and API rate limit risk. Webhook-driven state is the Stripe-recommended pattern. Tenant subscription state is stored locally to avoid Stripe API calls on every request.
**Alternatives considered:** Paddle — less developer familiarity, smaller ecosystem. Lemon Squeezy — less mature. Manual invoicing — not scalable.
### ADR-7: Usage metering uses in-request counters flushed to PostgreSQL
**Decision:** Per-request middleware increments in-memory counters per tenant per metric type (api_calls, token_issuances). A 60-second flush interval writes aggregated counts to a `usage_events` table in PostgreSQL. Free tier limits are checked at request time against a cached summary.
**Rationale:** Synchronous database writes on every API request would add latency and DB load. Async aggregation + periodic flush gives near-real-time metering with minimal overhead. Redis could buffer these, but PostgreSQL is sufficient for MVP flush intervals.
**Alternatives considered:** Stripe Metered Billing API (report per-unit usage to Stripe) — locked to Stripe, adds latency on usage reporting, complex to roll back. ClickHouse/TimescaleDB — overkill for MVP scale.
## Risks / Trade-offs
- **[Risk] Portal deployment is separate from API** → Mitigation: Document CORS configuration clearly. Portal calls the public API via `NEXT_PUBLIC_API_URL` env var. Deployments are independent.
- **[Risk] CLI polling for audit tail adds API load** → Mitigation: Polling interval defaults to 5s with exponential backoff. Document this limitation. Real-time tail via WebSockets is a Phase 5 enhancement.
- **[Risk] Stripe webhook signature verification must be enforced** → Mitigation: All webhook handlers verify `stripe-signature` header using `stripe.webhooks.constructEvent()` before processing. Reject without verification.
- **[Risk] GitHub Actions OIDC requires trust policy configuration per repo** → Mitigation: Document trust policy setup clearly in Action README. Provide a quickstart template for `/.github/workflows/sentryagent-setup.yml`.
- **[Risk] Free tier limit checks add latency on every request** → Mitigation: Limit summaries are cached in Redis with a 60s TTL. Stale cache means brief over-limit grace window — acceptable for MVP.
- **[Risk] ioredis migration may break existing Redis usage** → Mitigation: Existing Redis usage (Bull queue, session) already uses `ioredis` under the hood (Bull requires it). Migration is additive — replace rate-limiter middleware only, no existing code removed.
## Migration Plan
1. **WS1 first** (before any public traffic): deploy ioredis rate limiter, connection pool tuning, and detailed health endpoint. Run k6 load tests. Only proceed to WS2+ after load tests pass.
2. **WS2 + WS3 in parallel**: portal and CLI are independent. Portal deployed to CDN/Vercel. CLI published to npm.
3. **WS4**: Marketplace routes added to Express API behind feature flag (`MARKETPLACE_ENABLED=true`). Enable after WS1 hardening is confirmed stable.
4. **WS5**: GitHub Actions published to GitHub Actions Marketplace after OIDC trust policy documentation is complete.
5. **WS6 last**: Billing affects all tenants. Stripe webhooks registered in Stripe dashboard. `tenant_subscriptions` table migration applied. Free tier limits initially set generously; tightened after monitoring confirms limit logic is correct.
**Rollback strategy per workstream:**
- WS1: Rate limiter is middleware — revert to in-memory store by toggling env var (`REDIS_RATE_LIMIT_ENABLED=false`)
- WS2: Portal is separate deployment — roll back independently
- WS3: npm package — unpublish or yank specific version
- WS4: Feature flag `MARKETPLACE_ENABLED=false`
- WS5: GitHub Actions are versioned — pin to prior release tag
- WS6: Feature flag `BILLING_ENABLED=false` — disables enforcement, metering continues
## Open Questions
- **Portal domain**: Will `portal/` be served from `sentryagent.ai` (marketing site) or `app.sentryagent.ai` (portal subdomain)? Affects CORS and Next.js `basePath` config. Recommend: `sentryagent.ai` for portal, `app.sentryagent.ai` for dashboard.
- **Free tier limits**: Are 10 agents and 1,000 API calls/day the final limits, or placeholders? If placeholder, billing enforcement should be gated behind `BILLING_ENABLED` flag until limits are confirmed.
- **Marketplace moderation**: Will agent marketplace listings be auto-published on registration, or require manual approval? Recommend: auto-publish for MVP, flag-based moderation later.

View File

@@ -0,0 +1,50 @@
## Why
Phases 13 delivered a complete, enterprise-grade AgentIdP — authenticated, federated, multi-tenanted, and SOC 2 prepared. The product now needs to reach developers: Phase 4 shifts from building infrastructure to growing the ecosystem by making SentryAgent.ai frictionless to discover, adopt, and operate at scale in production.
## What Changes
- **Production Hardening**: Replace in-memory rate limiter with Redis-backed distributed limiter; tune database connection pooling; add detailed health endpoint; introduce k6 load test suite
- **Public Developer Portal**: Next.js 14 public website with interactive API explorer (Swagger UI), guided agent registration wizard, free tier docs, and SDK links
- **CLI Tool** (`sentryagent`): npm-installable CLI for register-agent, list-agents, issue-token, rotate-credentials, and tail-audit-log with `~/.sentryagent/config.json` and shell completion
- **Agent Marketplace**: Searchable public registry of AGNTCY-compliant agents with DID documents, capabilities, and publisher profiles — powered by existing agent registry and DID infrastructure
- **GitHub Actions Integration**: `sentryagent/register-agent` and `sentryagent/issue-token` Actions using OIDC federation with GitHub's OIDC provider — published to the GitHub Actions Marketplace
- **Billing & Usage Metering**: Stripe integration for paid tier; per-tenant usage tracking (API calls, active agents, token issuances); free tier limits enforced; usage dashboard in existing web dashboard
## Capabilities
### New Capabilities
- `production-hardening`: Redis-backed rate limiting, connection pooling, detailed health endpoint, and k6 load test suite
- `developer-portal`: Next.js 14 public website with Swagger UI API explorer, onboarding wizard, and SDK links
- `cli-tool`: `sentryagent` npm CLI with full agent lifecycle commands and shell completion
- `agent-marketplace`: Searchable public registry of published AGNTCY-compliant agents with DID documents
- `github-actions`: `register-agent` and `issue-token` GitHub Actions using OIDC federation
- `billing-metering`: Stripe-backed paid tier, per-tenant usage tracking, free tier enforcement, and usage dashboard
### Modified Capabilities
- `web-dashboard`: Usage metering panel added to existing dashboard (new billing/usage tab)
- `monitoring`: New Prometheus metrics for rate limiter hits, connection pool saturation, and per-tenant API call counters
## Impact
**Code affected:**
- `src/middleware/rateLimiter.ts` — replace express-rate-limit (in-memory) with ioredis-backed limiter
- `src/infrastructure/database.ts` — pg connection pool tuning
- `src/routes/health.ts` — add `/health/detailed` endpoint
- `src/services/UsageService.ts` — new service for per-tenant metering
- `src/controllers/BillingController.ts` — new controller for Stripe webhooks and subscription management
- `portal/` — new Next.js 14 application (separate directory)
- `cli/` — new CLI package (separate directory)
- `marketplace/` — new marketplace routes added to existing Express API
- `.github/actions/` — two new GitHub Actions
**New dependencies (CEO approved):**
- `ioredis` — Redis-backed rate limiting (WS1)
- `next` + `tailwindcss` — Developer portal (WS2)
- `swagger-ui-react` — Interactive API explorer (WS2)
- `commander` + `chalk` — CLI framework (WS3)
- `stripe` — Billing (WS6)
**Delivery sequence:** WS1 → WS2 + WS3 (parallel) → WS4 → WS5 → WS6

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Marketplace listing endpoint returns public agent registry
The system SHALL expose `GET /marketplace/agents` returning a paginated list of publicly visible agents. Each listing SHALL include: `agentId`, `name`, `description`, `capabilities` (array of strings), `publisherName`, `did`, `createdAt`. The endpoint SHALL be unauthenticated (public access). Agents are included in the marketplace when their `isPublic` flag is `true`.
#### Scenario: Unauthenticated user lists marketplace agents
- **WHEN** an unauthenticated client calls `GET /marketplace/agents`
- **THEN** the response is HTTP 200 with a paginated list of public agents
#### Scenario: Pagination works correctly
- **WHEN** a client calls `GET /marketplace/agents?page=2&limit=20`
- **THEN** the response returns the correct page of results with `totalCount`, `page`, and `totalPages` in the response envelope
### Requirement: Marketplace search filters agents by capability, publisher, or name
The system SHALL support `GET /marketplace/agents?q=<search>` performing a case-insensitive search across agent name, description, and capabilities. The system SHALL also support `GET /marketplace/agents?capability=<cap>` and `GET /marketplace/agents?publisher=<name>` for structured filtering.
#### Scenario: Full-text search returns relevant agents
- **WHEN** a client calls `GET /marketplace/agents?q=translation`
- **THEN** agents whose name, description, or capabilities contain "translation" are returned
#### Scenario: Capability filter returns matching agents
- **WHEN** a client calls `GET /marketplace/agents?capability=nlp`
- **THEN** only agents with "nlp" in their capabilities array are returned
### Requirement: Marketplace detail endpoint returns agent with DID document
The system SHALL expose `GET /marketplace/agents/:agentId` returning the full public agent profile including the W3C DID document and AGNTCY agent card. The endpoint SHALL be unauthenticated.
#### Scenario: Agent detail includes DID document
- **WHEN** a client calls `GET /marketplace/agents/:agentId` for a public agent
- **THEN** the response includes `agentId`, `name`, `description`, `capabilities`, `did`, `didDocument`, `agentCard`, and `publisherName`
#### Scenario: Private agent returns 404 on marketplace
- **WHEN** a client calls `GET /marketplace/agents/:agentId` for an agent with `isPublic: false`
- **THEN** the response is HTTP 404
### Requirement: Agents can be published to or withdrawn from the marketplace
The system SHALL allow authenticated tenant users to set `isPublic: true` on an agent via `PATCH /agents/:agentId` (`{ "isPublic": true }`), making it appear in the marketplace. Setting `isPublic: false` removes it from marketplace listings without deleting the agent.
#### Scenario: Agent published to marketplace
- **WHEN** an authenticated user calls `PATCH /agents/:agentId` with `{ "isPublic": true }`
- **THEN** the agent appears in `GET /marketplace/agents` results
#### Scenario: Agent withdrawn from marketplace
- **WHEN** an authenticated user calls `PATCH /agents/:agentId` with `{ "isPublic": false }`
- **THEN** the agent no longer appears in `GET /marketplace/agents` results

View File

@@ -0,0 +1,60 @@
## ADDED Requirements
### Requirement: Per-tenant usage is tracked for API calls, active agents, and token issuances
The system SHALL track the following usage metrics per tenant per day: `api_calls` (every authenticated API request), `token_issuances` (every successful `POST /oauth2/token`), `active_agents` (count of non-revoked agents at end of day). Usage SHALL be aggregated in memory and flushed to a `usage_events` PostgreSQL table every 60 seconds.
#### Scenario: API call increments usage counter
- **WHEN** an authenticated tenant makes any API request
- **THEN** the tenant's `api_calls` counter for the current day is incremented
#### Scenario: Usage is persisted to database on flush interval
- **WHEN** 60 seconds elapse since the last flush
- **THEN** all in-memory counters are written to the `usage_events` table and reset to zero
### Requirement: Free tier limits are enforced per tenant
The system SHALL enforce free tier limits: 10 active agents maximum, 1,000 API calls per day. When a limit is exceeded, the offending request SHALL be rejected with HTTP 429 and a response body indicating which limit was reached and how to upgrade. Limit summaries SHALL be cached in Redis with a 60-second TTL.
#### Scenario: Agent registration blocked at free tier limit
- **WHEN** a free-tier tenant with 10 active agents calls `POST /agents`
- **THEN** the response is HTTP 429 with `{ "error": "free_tier_limit", "limit": "agents", "max": 10, "upgradeUrl": "..." }`
#### Scenario: API call blocked after daily limit
- **WHEN** a free-tier tenant has made 1,000 API calls today and makes another request
- **THEN** the response is HTTP 429 with `{ "error": "free_tier_limit", "limit": "api_calls", "max": 1000, "upgradeUrl": "..." }`
#### Scenario: Paid tenant is not rate limited by usage tiers
- **WHEN** a paid-tier tenant exceeds free tier thresholds
- **THEN** the request is processed normally with no usage-based rejection
### Requirement: Stripe Checkout initiates paid tier subscription
The system SHALL expose `POST /billing/checkout` (authenticated) that creates a Stripe Checkout session for the paid tier plan and returns a `checkoutUrl`. The tenant is redirected to Stripe Checkout to complete payment. On success, Stripe sends a `customer.subscription.created` webhook event.
#### Scenario: Checkout session created
- **WHEN** an authenticated tenant calls `POST /billing/checkout`
- **THEN** the response is HTTP 200 with `{ "checkoutUrl": "https://checkout.stripe.com/..." }`
#### Scenario: Duplicate subscription prevented
- **WHEN** a tenant with an active paid subscription calls `POST /billing/checkout`
- **THEN** the response is HTTP 409 with `{ "error": "already_subscribed" }`
### Requirement: Stripe webhooks update tenant subscription state
The system SHALL expose `POST /billing/webhook` (Stripe webhook endpoint) that verifies the `stripe-signature` header using `stripe.webhooks.constructEvent()` and processes: `customer.subscription.created` (set tenant to paid), `invoice.payment_succeeded` (extend subscription period), `customer.subscription.deleted` (revert tenant to free tier). All events without valid signatures SHALL be rejected with HTTP 400.
#### Scenario: Webhook without valid signature is rejected
- **WHEN** `POST /billing/webhook` is called with an invalid or missing `stripe-signature` header
- **THEN** the response is HTTP 400 and no state is changed
#### Scenario: Subscription created webhook activates paid tier
- **WHEN** Stripe sends a valid `customer.subscription.created` event for a tenant
- **THEN** the tenant's `subscriptionStatus` is updated to `active` and free tier limits no longer apply
#### Scenario: Subscription deleted webhook reverts to free tier
- **WHEN** Stripe sends a valid `customer.subscription.deleted` event
- **THEN** the tenant's `subscriptionStatus` is updated to `cancelled` and free tier limits are re-enforced
### Requirement: Billing is feature-flag gated
All billing enforcement and Stripe integration SHALL be gated behind the `BILLING_ENABLED` environment variable. When `BILLING_ENABLED=false`, free tier limits are not enforced, all tenants have paid-tier access, and Stripe webhook endpoint returns HTTP 200 without processing. Usage metering continues regardless of this flag.
#### Scenario: Billing disabled — no limits enforced
- **WHEN** `BILLING_ENABLED=false` and a free-tier tenant has 11 active agents
- **THEN** agent registration succeeds without HTTP 429

View File

@@ -0,0 +1,65 @@
## ADDED Requirements
### Requirement: sentryagent CLI is an installable npm package
The system SHALL provide a `sentryagent` CLI at `cli/` with its own `package.json`, built with `commander` and `chalk`, and published to npm as `sentryagent`. The CLI SHALL be installable globally via `npm install -g sentryagent`. The CLI binary SHALL be named `sentryagent`.
#### Scenario: CLI installs and shows help
- **WHEN** a user runs `npm install -g sentryagent` and then `sentryagent --help`
- **THEN** the command displays available subcommands and global options without errors
#### Scenario: CLI version flag works
- **WHEN** a user runs `sentryagent --version`
- **THEN** the CLI outputs its version number matching `package.json`
### Requirement: CLI persists configuration in ~/.sentryagent/config.json
The CLI SHALL store API endpoint (`apiUrl`) and credentials (`clientId`, `clientSecret`) in `~/.sentryagent/config.json`. The `sentryagent configure` command SHALL prompt for these values interactively and write them to the config file. All other commands SHALL read from this config file automatically.
#### Scenario: Configure command saves credentials
- **WHEN** a user runs `sentryagent configure` and enters an API URL, client ID, and client secret
- **THEN** `~/.sentryagent/config.json` is created or updated with the entered values
#### Scenario: Command fails gracefully when not configured
- **WHEN** a user runs any command before running `sentryagent configure`
- **THEN** the CLI outputs a human-readable error: "Not configured. Run `sentryagent configure` first."
### Requirement: register-agent command registers a new agent
The CLI SHALL provide `sentryagent register-agent --name <name> [--description <desc>]` that calls `POST /agents` and outputs the created agent's ID and name.
#### Scenario: Agent registered successfully
- **WHEN** a user runs `sentryagent register-agent --name "my-agent"`
- **THEN** the CLI outputs the new agent ID and confirms registration
### Requirement: list-agents command lists all agents
The CLI SHALL provide `sentryagent list-agents` that calls `GET /agents` and outputs a formatted table of agent ID, name, status, and creation date.
#### Scenario: Agents listed in table format
- **WHEN** a user runs `sentryagent list-agents`
- **THEN** the CLI outputs a formatted table with all agents for the authenticated tenant
### Requirement: issue-token command issues an OAuth2 token
The CLI SHALL provide `sentryagent issue-token --agent-id <id>` that calls `POST /oauth2/token` and outputs the access token and expiry.
#### Scenario: Token issued successfully
- **WHEN** a user runs `sentryagent issue-token --agent-id <id>`
- **THEN** the CLI outputs the access token and its expiry timestamp
### Requirement: rotate-credentials command rotates agent credentials
The CLI SHALL provide `sentryagent rotate-credentials --agent-id <id>` that calls `POST /agents/:id/credentials/rotate` and outputs the new client secret.
#### Scenario: Credentials rotated with confirmation prompt
- **WHEN** a user runs `sentryagent rotate-credentials --agent-id <id>`
- **THEN** the CLI prompts for confirmation ("This will invalidate the current secret. Continue? [y/N]") before rotating
### Requirement: tail-audit-log command polls and streams audit events
The CLI SHALL provide `sentryagent tail-audit-log [--agent-id <id>]` that polls `GET /audit/logs` every 5 seconds and streams new events to stdout in a human-readable format. The command SHALL run until the user presses Ctrl+C.
#### Scenario: Audit log events stream to stdout
- **WHEN** a user runs `sentryagent tail-audit-log`
- **THEN** new audit events appear in the terminal as they are created, one per line
### Requirement: CLI supports bash and zsh shell completion
The CLI SHALL provide `sentryagent completion bash` and `sentryagent completion zsh` commands that output shell completion scripts. Installation instructions SHALL be included in the CLI README.
#### Scenario: Bash completion script is generated
- **WHEN** a user runs `sentryagent completion bash`
- **THEN** a valid bash completion script is output to stdout

View File

@@ -0,0 +1,48 @@
## ADDED Requirements
### Requirement: Public developer portal is a standalone Next.js 14 application
The system SHALL provide a public developer portal at `portal/` — a standalone Next.js 14 application with Tailwind CSS. The portal SHALL be deployable independently of the API (to Vercel, Cloudflare Pages, or any static host). The portal SHALL communicate with the API exclusively via the public REST API at `NEXT_PUBLIC_API_URL`. No server-side API secrets SHALL be stored in the portal.
#### Scenario: Portal builds and exports successfully
- **WHEN** `npm run build` is executed in `portal/`
- **THEN** the build completes without errors and produces a deployable `out/` or `.next/` directory
#### Scenario: API URL is configurable via environment variable
- **WHEN** `NEXT_PUBLIC_API_URL=https://api.sentryagent.ai` is set and the portal is built
- **THEN** all API calls in the portal use that base URL
### Requirement: Interactive API explorer is embedded in the portal
The portal SHALL embed a Swagger UI (`swagger-ui-react`) loaded from the existing OpenAPI spec at `/openapi.json` (served by the Express API). The explorer SHALL allow unauthenticated browsing of all endpoints and authenticated execution using a Bearer token entered by the user.
#### Scenario: API explorer loads the OpenAPI spec
- **WHEN** a user visits `/api-explorer`
- **THEN** Swagger UI loads and renders all endpoints from the OpenAPI spec without errors
#### Scenario: User executes authenticated request in explorer
- **WHEN** a user enters a Bearer token in the Swagger UI Authorize dialog and executes `GET /agents`
- **THEN** the request is sent with `Authorization: Bearer <token>` and the response is displayed
### Requirement: Agent registration onboarding wizard guides new users
The portal SHALL provide a guided wizard at `/get-started` covering: (1) sign up / log in, (2) create first agent, (3) generate credentials, (4) copy code snippet for their preferred SDK (Node.js, Python, Go, Java). The wizard SHALL complete in ≤ 4 steps.
#### Scenario: Wizard completes agent registration
- **WHEN** a new user completes all wizard steps
- **THEN** an agent is registered via the API, credentials are generated, and a ready-to-run code snippet is displayed
#### Scenario: SDK code snippet matches selected language
- **WHEN** a user selects "Python" as their preferred SDK in step 4
- **THEN** the displayed code snippet uses the Python SDK (`sentryagent-idp`) syntax
### Requirement: Free tier documentation page explains limits and upgrade path
The portal SHALL include a `/pricing` page documenting free tier limits (10 agents, 1,000 API calls/day), the paid tier capabilities (unlimited agents, higher rate limits, SLA), and a clear call-to-action to upgrade via Stripe Checkout. The page SHALL not require authentication to view.
#### Scenario: Pricing page is publicly accessible
- **WHEN** an unauthenticated user visits `/pricing`
- **THEN** the page renders with free tier limits and upgrade CTA without requiring login
### Requirement: Portal links to all four SDKs
The portal SHALL include an `/sdks` page listing all four officially supported SDKs (Node.js, Python, Go, Java) with: package name, installation command, a minimal working example, and a link to the SDK repository.
#### Scenario: SDK page displays all four SDKs
- **WHEN** a user visits `/sdks`
- **THEN** all four SDKs are listed with installation commands and code examples

View File

@@ -0,0 +1,41 @@
## ADDED Requirements
### Requirement: register-agent Action registers an agent in CI using OIDC
The system SHALL provide a GitHub Action at `.github/actions/register-agent/action.yml` (`sentryagent/register-agent@v1`) that registers a new agent via the SentryAgent.ai API using GitHub OIDC token exchange. The Action SHALL accept inputs: `api-url` (required), `agent-name` (required), `agent-description` (optional). The Action SHALL output: `agent-id`. No long-lived API credentials SHALL be required.
#### Scenario: Agent registered in CI workflow
- **WHEN** a GitHub Actions workflow includes `uses: sentryagent/register-agent@v1` with valid `api-url` and `agent-name` inputs
- **THEN** the step completes successfully, an agent is registered in SentryAgent.ai, and `steps.<id>.outputs.agent-id` is populated
#### Scenario: OIDC exchange fails — action fails with clear message
- **WHEN** the GitHub OIDC token cannot be exchanged (e.g., trust policy not configured)
- **THEN** the action fails with an error message explaining how to configure the OIDC trust policy
### Requirement: issue-token Action issues an OAuth2 token in CI using OIDC
The system SHALL provide a GitHub Action at `.github/actions/issue-token/action.yml` (`sentryagent/issue-token@v1`) that issues an OAuth2 access token for an agent via OIDC exchange. The Action SHALL accept inputs: `api-url` (required), `agent-id` (required). The Action SHALL output: `access-token`, `expires-at`. The access token SHALL be masked in GitHub Actions logs.
#### Scenario: Token issued in CI workflow
- **WHEN** a GitHub Actions workflow includes `uses: sentryagent/issue-token@v1` with `api-url` and `agent-id`
- **THEN** the step completes and `steps.<id>.outputs.access-token` contains a valid Bearer token
#### Scenario: Access token is masked in logs
- **WHEN** the action issues a token
- **THEN** the token value is registered with `core.setSecret()` and does not appear in plaintext in the workflow log
### Requirement: GitHub OIDC trust policy is configurable via API
The system SHALL allow tenants to register a GitHub OIDC trust policy via `POST /oidc/trust-policies` specifying: `provider: "github"`, `repository` (e.g., `org/repo`), `branch` (optional), and `agentId`. Only workflows matching the trust policy SHALL be permitted to exchange GitHub OIDC tokens for SentryAgent.ai agent tokens.
#### Scenario: Trust policy restricts token exchange to specified repo
- **WHEN** a trust policy is registered for `org/repo-a` and a GitHub OIDC token from `org/repo-b` is presented
- **THEN** the token exchange is rejected with HTTP 403
#### Scenario: Trust policy permits token exchange for matching repo
- **WHEN** a trust policy is registered for `org/repo-a` and a valid GitHub OIDC token from `org/repo-a` is presented
- **THEN** the token exchange succeeds and an agent access token is returned
### Requirement: Both Actions include README with setup instructions
Each Action directory SHALL include a `README.md` with: purpose, prerequisites (OIDC trust policy setup), inputs table, outputs table, a minimal workflow example, and a link to full documentation on the developer portal.
#### Scenario: README is present and complete
- **WHEN** a developer reads `register-agent/README.md`
- **THEN** they can configure the OIDC trust policy and add the action to their workflow without external documentation

View File

@@ -0,0 +1,29 @@
## ADDED Requirements
### Requirement: Rate limiter hit counter is exposed as Prometheus metric
The system SHALL expose a `agentidp_rate_limit_hits_total` counter (labels: `endpoint`, `tenant_id`) incremented each time a request is rejected by the Redis-backed rate limiter (HTTP 429). This metric SHALL be available at `GET /metrics` alongside existing metrics.
#### Scenario: Rate limit rejection increments counter
- **WHEN** a client is rejected by the rate limiter on `POST /oauth2/token`
- **THEN** `agentidp_rate_limit_hits_total{endpoint="/oauth2/token"}` is incremented by 1
### Requirement: Database connection pool saturation is exposed as Prometheus metric
The system SHALL expose `agentidp_db_pool_active_connections` (gauge) and `agentidp_db_pool_waiting_requests` (gauge) reflecting the current number of active database connections and queued requests waiting for a free connection.
#### Scenario: Pool metrics reflect current state
- **WHEN** 15 of 20 pool connections are in use and 2 requests are queued
- **THEN** `agentidp_db_pool_active_connections` reads 15 and `agentidp_db_pool_waiting_requests` reads 2
### Requirement: Per-tenant API call rate is exposed as Prometheus metric
The system SHALL expose `agentidp_tenant_api_calls_total` counter (label: `tenant_id`) incremented on every authenticated API request. This enables per-tenant traffic analysis in Grafana.
#### Scenario: Per-tenant counter increments on authenticated request
- **WHEN** tenant `org-abc` makes an authenticated API call
- **THEN** `agentidp_tenant_api_calls_total{tenant_id="org-abc"}` is incremented by 1
### Requirement: Usage tier enforcement rejections are tracked as Prometheus metric
The system SHALL expose `agentidp_billing_limit_rejections_total` counter (labels: `tenant_id`, `limit_type`) incremented each time a request is rejected due to a free tier limit (`agents` or `api_calls`).
#### Scenario: Agent limit rejection increments counter
- **WHEN** a free-tier tenant is rejected from creating an agent due to the 10-agent limit
- **THEN** `agentidp_billing_limit_rejections_total{limit_type="agents"}` is incremented by 1

View File

@@ -0,0 +1,53 @@
## ADDED Requirements
### Requirement: Redis-backed distributed rate limiting replaces in-memory limiter
The system SHALL use `ioredis` + `rate-limiter-flexible` to enforce rate limits across all Express instances using a Redis sliding window algorithm. The in-memory `express-rate-limit` store SHALL be removed. Rate limit configuration SHALL be injectable via environment variables (`RATE_LIMIT_WINDOW_MS`, `RATE_LIMIT_MAX_REQUESTS`). When `REDIS_RATE_LIMIT_ENABLED=false`, the system SHALL fall back to an in-memory limiter for local development.
#### Scenario: Rate limit enforced across multiple instances
- **WHEN** two Express instances are running behind a load balancer and a client sends requests alternating between instances
- **THEN** the rate limit counter is shared across both instances via Redis and the client is rejected after the combined limit is reached
#### Scenario: Redis unavailable — graceful fallback
- **WHEN** Redis is unreachable and `REDIS_RATE_LIMIT_ENABLED=true`
- **THEN** the system SHALL log a warning and fall back to in-memory limiting rather than rejecting all requests
#### Scenario: Rate limit exceeded
- **WHEN** a client exceeds the configured request limit within the window
- **THEN** the system SHALL respond with HTTP 429 and a `Retry-After` header indicating when the window resets
### Requirement: Database connection pool is explicitly configured
The system SHALL configure `pg` connection pool with explicit `max`, `min`, `idleTimeoutMillis`, and `connectionTimeoutMillis` parameters via environment variables (`DB_POOL_MAX`, `DB_POOL_MIN`, `DB_POOL_IDLE_TIMEOUT_MS`, `DB_POOL_CONNECTION_TIMEOUT_MS`). Defaults SHALL be: max=20, min=2, idleTimeout=30000ms, connectionTimeout=5000ms.
#### Scenario: Pool exhaustion under load
- **WHEN** all pool connections are in use and a new query is requested
- **THEN** the system SHALL queue the request and resolve it within `DB_POOL_CONNECTION_TIMEOUT_MS`, or reject with a 503 if timeout is exceeded
#### Scenario: Idle connections are reaped
- **WHEN** a connection has been idle for longer than `DB_POOL_IDLE_TIMEOUT_MS`
- **THEN** the pool SHALL close the connection and reduce active pool size toward `min`
### Requirement: Detailed health endpoint reports per-service status
The system SHALL expose `GET /health/detailed` returning a JSON object with individual status for each dependency: `database`, `redis`, `vault` (if configured), `opa` (if configured). Each service SHALL report `status` (`healthy` | `degraded` | `unreachable`), `latencyMs`, and an optional `message`. The overall response status SHALL be HTTP 200 if all services are healthy, HTTP 207 if any are degraded, and HTTP 503 if any are unreachable.
#### Scenario: All services healthy
- **WHEN** all dependencies respond within acceptable latency
- **THEN** `GET /health/detailed` returns HTTP 200 with all services reporting `status: "healthy"`
#### Scenario: Redis unreachable
- **WHEN** Redis does not respond within 2000ms
- **THEN** `GET /health/detailed` returns HTTP 503 with `redis.status: "unreachable"` and overall `status: "unhealthy"`
#### Scenario: Vault degraded
- **WHEN** Vault responds but with latency exceeding 1000ms
- **THEN** `GET /health/detailed` returns HTTP 207 with `vault.status: "degraded"` and a latency measurement
### Requirement: k6 load test suite validates production readiness
The system SHALL include a k6 load test suite at `tests/load/` covering: agent registration under load (100 virtual users, 60s), token issuance under load (1000 virtual users, 60s), and credential rotation under load (50 virtual users, 60s). Each scenario SHALL define pass/fail thresholds: p95 response time < 500ms, error rate < 1%.
#### Scenario: Token issuance load test passes thresholds
- **WHEN** the k6 load test `token-issuance.js` runs with 1000 virtual users for 60 seconds
- **THEN** p95 response time SHALL be below 500ms and error rate SHALL be below 1%
#### Scenario: Load test threshold failure surfaces clearly
- **WHEN** a k6 threshold is breached during the load test run
- **THEN** the k6 process SHALL exit with a non-zero exit code, making CI failure explicit

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Usage dashboard tab displays per-tenant metering data
The web dashboard SHALL include a "Usage" tab in the main navigation displaying the current billing period's usage: API calls used / daily limit, active agents count / agent limit, token issuances this period. Usage data SHALL be fetched from `GET /billing/usage` (new authenticated endpoint). The tab SHALL update on page load and on a 60-second polling interval.
#### Scenario: Usage tab shows current metrics
- **WHEN** an authenticated user navigates to the Usage tab
- **THEN** the dashboard displays current API call count, agent count, and token issuance count for the current billing period
#### Scenario: Free tier warning shown when approaching limit
- **WHEN** a free-tier tenant has used ≥ 80% of their daily API call limit
- **THEN** a warning banner is displayed with a link to the upgrade/pricing page
### Requirement: Billing status panel shows subscription tier and upgrade CTA
The web dashboard Usage tab SHALL include a billing status panel showing: current tier (Free / Paid), subscription status (active / cancelled / trial), and — for free-tier tenants — an "Upgrade" button linking to `POST /billing/checkout` flow.
#### Scenario: Free tier tenant sees upgrade CTA
- **WHEN** a free-tier tenant views the Usage tab
- **THEN** an "Upgrade to Paid" button is visible and initiates Stripe Checkout when clicked
#### Scenario: Paid tier tenant sees subscription status
- **WHEN** a paid-tier tenant views the Usage tab
- **THEN** the panel shows "Paid" tier with subscription status and next renewal date, with no upgrade CTA

View File

@@ -0,0 +1,122 @@
## 1. WS1: Production Hardening — Redis Rate Limiting
- [x] 1.1 Install `ioredis` and `rate-limiter-flexible` — add to package.json dependencies
- [x] 1.2 Create `src/infrastructure/redisClient.ts` — singleton ioredis client with connection error handling and `REDIS_RATE_LIMIT_ENABLED` env var guard
- [x] 1.3 Replace in-memory `express-rate-limit` with `RateLimiterRedis` from `rate-limiter-flexible` — sliding window, configurable via `RATE_LIMIT_WINDOW_MS` and `RATE_LIMIT_MAX_REQUESTS`
- [x] 1.4 Implement graceful fallback to `RateLimiterMemory` when Redis is unreachable
- [x] 1.5 Add `agentidp_rate_limit_hits_total` Prometheus counter (labels: `endpoint`) — increment on HTTP 429
- [x] 1.6 Update rate limiter middleware to set `Retry-After` header on rejection
- [x] 1.7 Write unit tests for rate limiter middleware — Redis path, fallback path, 429 response shape
## 2. WS1: Production Hardening — Database Pool & Health
- [x] 2.1 Add `DB_POOL_MAX`, `DB_POOL_MIN`, `DB_POOL_IDLE_TIMEOUT_MS`, `DB_POOL_CONNECTION_TIMEOUT_MS` env vars to `.env.example` and database config
- [x] 2.2 Configure `pg.Pool` with explicit pool parameters; defaults: max=20, min=2, idle=30000ms, conn timeout=5000ms
- [x] 2.3 Expose `agentidp_db_pool_active_connections` gauge and `agentidp_db_pool_waiting_requests` gauge — update on pool events
- [x] 2.4 Create `GET /health/detailed` route and controller — check database, Redis, Vault (if configured), OPA (if configured)
- [x] 2.5 Implement per-service health checks with latency measurement — `healthy` / `degraded` (>1000ms) / `unreachable` (timeout/error)
- [x] 2.6 Return HTTP 200 (all healthy), HTTP 207 (any degraded), HTTP 503 (any unreachable)
- [x] 2.7 Write unit tests for health controller — all healthy, degraded, unreachable scenarios
## 3. WS1: Production Hardening — Load Tests
- [x] 3.1 Install k6 and create `tests/load/` directory with `README.md` explaining how to run tests
- [x] 3.2 Write `tests/load/agent-registration.js` — 100 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
- [x] 3.3 Write `tests/load/token-issuance.js` — 1000 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
- [x] 3.4 Write `tests/load/credential-rotation.js` — 50 VUs, 60s, threshold: p95 < 500ms, error rate < 1%
- [x] 3.5 Add `npm run load-test` script to package.json running all three k6 scenarios sequentially
## 4. WS2: Developer Portal — Setup & Core Pages
- [x] 4.1 Scaffold `portal/` as a standalone Next.js 14 app with Tailwind CSS — `npx create-next-app@latest portal --typescript --tailwind`
- [x] 4.2 Add `NEXT_PUBLIC_API_URL` env var support — create `portal/.env.example`
- [x] 4.3 Create portal home page (`portal/app/page.tsx`) — hero, product description, CTA to `/get-started`
- [x] 4.4 Create `/pricing` page with free tier limits table (10 agents, 1,000 calls/day) and paid tier CTA
- [x] 4.5 Create `/sdks` page listing all 4 SDKs with installation commands and minimal code examples
- [x] 4.6 Create shared nav component with links to: Home, API Explorer, Get Started, SDKs, Pricing
## 5. WS2: Developer Portal — API Explorer & Onboarding Wizard
- [x] 5.1 Install `swagger-ui-react` in `portal/` — add to portal package.json
- [x] 5.2 Create `/api-explorer` page embedding Swagger UI loaded from `NEXT_PUBLIC_API_URL/openapi.json`
- [x] 5.3 Configure Swagger UI with `persistAuthorization: true` and Bearer token auth scheme
- [x] 5.4 Create `/get-started` wizard — Step 1: account setup instructions
- [x] 5.5 Create wizard Step 2: agent name input → calls `POST /agents` via API → displays agent ID
- [x] 5.6 Create wizard Step 3: generate credentials → calls credentials endpoint → displays client ID/secret with copy buttons
- [x] 5.7 Create wizard Step 4: SDK selection → displays ready-to-run code snippet for chosen SDK (Node.js / Python / Go / Java)
- [x] 5.8 Wizard state management using React `useState` — no external state library needed
- [x] 5.9 Build `portal/``npm run build` passes without errors or TypeScript errors
## 6. WS3: CLI Tool — Setup & Configuration
- [x] 6.1 Scaffold `cli/` directory with `package.json` (name: `sentryagent`, bin: `sentryagent`) — TypeScript with `commander` and `chalk`
- [x] 6.2 Create `cli/src/config.ts` — read/write `~/.sentryagent/config.json` with `apiUrl`, `clientId`, `clientSecret`
- [x] 6.3 Implement `sentryagent configure` command — prompts for API URL, client ID, client secret using `readline` — writes to config file
- [x] 6.4 Implement config validation helper — fail with "Not configured. Run `sentryagent configure` first." if config missing
- [x] 6.5 Implement `sentryagent --version` outputting version from package.json
- [x] 6.6 Implement `sentryagent --help` showing all available commands
## 7. WS3: CLI Tool — Agent Commands
- [x] 7.1 Implement `sentryagent register-agent --name <name> [--description <desc>]` — calls `POST /agents`, outputs agent ID
- [x] 7.2 Implement `sentryagent list-agents` — calls `GET /agents`, outputs formatted table with chalk
- [x] 7.3 Implement `sentryagent issue-token --agent-id <id>` — calls `POST /oauth2/token`, outputs access token and expiry
- [x] 7.4 Implement `sentryagent rotate-credentials --agent-id <id>` — prompts for confirmation, calls rotate endpoint, outputs new secret
- [x] 7.5 Implement `sentryagent tail-audit-log [--agent-id <id>]` — polls `GET /audit/logs` every 5s, streams new events to stdout, runs until Ctrl+C
- [x] 7.6 Implement `sentryagent completion bash` and `sentryagent completion zsh` — output shell completion scripts
- [x] 7.7 Write `cli/README.md` — installation, configuration, all commands with examples, shell completion setup
- [x] 7.8 Build CLI — `npm run build` in `cli/` passes; `node dist/index.js --help` works
## 8. WS4: Agent Marketplace
- [x] 8.1 Add `is_public` boolean column (default false) to `agents` table — create migration `006_add_agent_marketplace.sql`
- [x] 8.2 Update `PATCH /agents/:id` to accept `isPublic` field — update AgentService and AgentController
- [x] 8.3 Create `MarketplaceService` with `listPublicAgents(filters, pagination)` and `getPublicAgent(agentId)` methods
- [x] 8.4 Create `GET /marketplace/agents` endpoint — unauthenticated, paginated, supports `?q=`, `?capability=`, `?publisher=` filters
- [x] 8.5 Create `GET /marketplace/agents/:agentId` endpoint — unauthenticated, returns agent with DID document and agent card
- [x] 8.6 Add `agentidp_tenant_api_calls_total` Prometheus counter (label: `tenant_id`) — increment on authenticated requests
- [x] 8.7 Add `MARKETPLACE_ENABLED` feature flag — return 404 on all marketplace routes when disabled
- [x] 8.8 Write unit tests for MarketplaceService — list, filter, get, public/private visibility
- [x] 8.9 Update OpenAPI spec to document `/marketplace/agents` endpoints
## 9. WS5: GitHub Actions
- [x] 9.1 Create `.github/actions/register-agent/action.yml` — inputs: `api-url`, `agent-name`, `agent-description`; outputs: `agent-id`
- [x] 9.2 Implement register-agent Action script (`action.js`) — exchange GitHub OIDC token via `POST /oidc/token`, then call `POST /agents`
- [x] 9.3 Implement OIDC token exchange error handling in register-agent — clear error message with trust policy setup link
- [x] 9.4 Create `.github/actions/issue-token/action.yml` — inputs: `api-url`, `agent-id`; outputs: `access-token`, `expires-at`
- [x] 9.5 Implement issue-token Action script — exchange GitHub OIDC token, call `POST /oauth2/token`, mask token with `core.setSecret()`
- [x] 9.6 Create `POST /oidc/trust-policies` endpoint — accepts `provider`, `repository`, `branch`, `agentId` — stores trust policy
- [x] 9.7 Enforce trust policy on GitHub OIDC token exchange — reject tokens from repos not matching a registered policy with HTTP 403
- [x] 9.8 Write `register-agent/README.md` — purpose, OIDC trust policy setup, inputs, outputs, example workflow
- [x] 9.9 Write `issue-token/README.md` — same structure as register-agent README
## 10. WS6: Billing & Usage Metering
- [x] 10.1 Create migration `007_add_billing.sql``tenant_subscriptions` table (tenant_id, status, stripe_customer_id, stripe_subscription_id, current_period_end) and `usage_events` table (tenant_id, date, metric_type, count)
- [x] 10.2 Install `stripe` npm package — add to package.json
- [x] 10.3 Create `UsageMeteringMiddleware` — increments in-memory per-tenant counters on every authenticated request; flushes to `usage_events` every 60s
- [x] 10.4 Create `UsageService` with `getDailyUsage(tenantId, date)` and `getActivAgentCount(tenantId)` methods
- [x] 10.5 Create `FreeTierEnforcementMiddleware` — checks usage cache (Redis, 60s TTL) before agent creation and API calls; rejects with HTTP 429 when limit exceeded; skips when `BILLING_ENABLED=false`
- [x] 10.6 Add `agentidp_billing_limit_rejections_total` Prometheus counter (labels: `tenant_id`, `limit_type`)
- [x] 10.7 Create `BillingService` with `createCheckoutSession(tenantId)`, `handleWebhookEvent(event)`, `getSubscriptionStatus(tenantId)` methods
- [x] 10.8 Create `POST /billing/checkout` endpoint — creates Stripe Checkout session, returns `checkoutUrl`
- [x] 10.9 Create `POST /billing/webhook` endpoint — verifies Stripe signature, processes subscription events, updates `tenant_subscriptions`
- [x] 10.10 Create `GET /billing/usage` endpoint (authenticated) — returns current period usage summary for tenant
- [x] 10.11 Add `BILLING_ENABLED` env var — disable enforcement and Stripe processing when false; document in `.env.example`
- [x] 10.12 Write unit tests for UsageService, BillingService, FreeTierEnforcementMiddleware — free tier block, paid tier pass-through, webhook processing
- [x] 10.13 Update web dashboard — add "Usage" tab to navigation with billing status panel and usage metrics from `GET /billing/usage`
## 11. QA & Release
- [x] 11.1 Run full TypeScript check across all packages (`tsc --noEmit`) — zero errors
- [x] 11.2 Run all unit tests (`npm test`) — all pass, coverage ≥ 80%
- [x] 11.3 Run k6 load tests — all thresholds pass (p95 < 500ms, error rate < 1%)
- [x] 11.4 Verify `GET /health/detailed` returns correct status for all dependency states
- [x] 11.5 Verify marketplace endpoints are unauthenticated and return correct data
- [x] 11.6 Verify Stripe webhook signature rejection on invalid signature
- [x] 11.7 Verify free tier limit enforcement with `BILLING_ENABLED=true`
- [x] 11.8 Verify `BILLING_ENABLED=false` disables enforcement without breaking metering
- [x] 11.9 Build portal — `npm run build` passes in `portal/`
- [x] 11.10 Build CLI — `npm run build` passes in `cli/`; `sentryagent --help` works
- [x] 11.11 Commit all Phase 4 work on `main` — conventional commit message per workstream

View File

@@ -0,0 +1,50 @@
## ADDED Requirements
### Requirement: API reference exists at docs/developers/api-reference.md
The system SHALL provide a human-readable API reference at `docs/developers/api-reference.md` covering all 14 endpoints across the four services: Agent Registry, OAuth 2.0 Token, Credential Management, and Audit Log.
#### Scenario: Developer finds any endpoint within 10 seconds
- **WHEN** the developer opens the API reference
- **THEN** they SHALL find a table of contents at the top linking to each of the four service sections
### Requirement: Every endpoint is documented with method, path, description, and auth requirements
For each of the 14 endpoints, the reference SHALL document: HTTP method, path, one-sentence description, and whether Bearer token auth is required.
#### Scenario: Developer knows which endpoints require authentication
- **WHEN** the developer scans the reference
- **THEN** they SHALL clearly see which endpoints require a Bearer token (all except POST /token) and which do not
### Requirement: Every endpoint includes a complete curl example
For each endpoint, the reference SHALL include at least one complete, runnable curl example with real placeholder values.
#### Scenario: Developer copies a curl example and runs it
- **WHEN** the developer copies a curl example from the reference
- **THEN** the command SHALL be complete — no ellipses, no `...`, no missing flags — requiring only substitution of their own agentId, token, and base URL
### Requirement: Every endpoint documents all request parameters and body fields
For each endpoint that accepts a request body or query parameters, the reference SHALL list every field with: name, type, required/optional, description, and validation constraints.
#### Scenario: Developer knows what fields are required for POST /agents
- **WHEN** the developer reads the POST /agents section
- **THEN** they SHALL see a table listing every field, its type, whether it is required, and any constraints (e.g. email format, max length)
### Requirement: Every endpoint documents all response codes and response body schemas
For each endpoint, the reference SHALL document every possible HTTP response code (2xx and 4xx/5xx) with a description and example response body.
#### Scenario: Developer understands a 429 response
- **WHEN** the developer reads the rate limit error documentation
- **THEN** they SHALL understand what triggered it, what the X-RateLimit-* headers mean, and when they can retry
### Requirement: API reference includes a base URL and versioning section
The reference SHALL include a section at the top explaining the base URL convention, port configuration, and that all endpoints are unversioned in Phase 1.
#### Scenario: Developer knows where to send requests
- **WHEN** the developer reads the base URL section
- **THEN** they SHALL see the default base URL (http://localhost:3000), how to change the port via environment variable, and a note that versioning will be introduced in Phase 2
### Requirement: API reference includes an errors section
The reference SHALL include a dedicated errors section listing all standard error response shapes, all custom error codes, and their HTTP status code mappings.
#### Scenario: Developer handles an AgentNotFoundError
- **WHEN** the developer reads the errors section
- **THEN** they SHALL see the exact JSON shape of the error response, the error code string, and the HTTP status (404)

View File

@@ -0,0 +1,43 @@
## ADDED Requirements
### Requirement: Core concepts guide exists at docs/developers/concepts.md
The system SHALL provide a concepts guide at `docs/developers/concepts.md` that explains the AgentIdP model in plain English with no assumed prior knowledge of AGNTCY or OAuth 2.0.
#### Scenario: Developer understands what AgentIdP is
- **WHEN** a developer reads the concepts guide
- **THEN** they SHALL be able to explain in one sentence what SentryAgent.ai AgentIdP does and why they need it
### Requirement: Concepts guide explains what an AI agent identity is
The guide SHALL explain in plain English what it means to give an AI agent an identity — how it differs from a human user account and why agents need their own identity model.
#### Scenario: Agent identity vs human identity distinction is clear
- **WHEN** the developer reads the agent identity section
- **THEN** they SHALL understand that agents are non-human, machine-operated identities that need persistent, auditable credentials — not session-based logins
### Requirement: Concepts guide explains the AGNTCY alignment
The guide SHALL explain what AGNTCY is (Linux Foundation standard), why SentryAgent.ai aligns to it, and what benefit that gives the developer — without requiring the developer to read the AGNTCY specification.
#### Scenario: Developer understands AGNTCY without external reading
- **WHEN** the developer reads the AGNTCY section
- **THEN** they SHALL understand that AGNTCY-aligned agent IDs are interoperable across the AI agent ecosystem, and that SentryAgent.ai implements this for free
### Requirement: Concepts guide explains the agent lifecycle
The guide SHALL explain the four lifecycle states of an agent (active, suspended, decommissioned) and what each state means for credential and token behaviour.
#### Scenario: Developer understands what happens when an agent is decommissioned
- **WHEN** the developer reads the lifecycle section
- **THEN** they SHALL understand that decommissioning is irreversible, all credentials are revoked, and no new tokens can be issued
### Requirement: Concepts guide explains OAuth 2.0 Client Credentials in plain English
The guide SHALL explain the Client Credentials grant in plain English — no RFC references, no formal OAuth jargon — focused on how agents use it to authenticate.
#### Scenario: Developer understands client_id and client_secret without prior OAuth knowledge
- **WHEN** the developer reads the OAuth section
- **THEN** they SHALL understand that client_id identifies the agent and client_secret proves it — analogous to a username and password for machines
### Requirement: Concepts guide explains the free-tier limits
The guide SHALL document all free-tier limits (100 agents, 10,000 tokens/month, 100 req/min, 90-day audit retention) in a clear table.
#### Scenario: Developer knows the limits before hitting them
- **WHEN** the developer reads the free-tier section
- **THEN** they SHALL see a table with all four limits and a note on what happens when each is exceeded

View File

@@ -0,0 +1,4 @@
## ADDED Requirements
### Requirement: Database doc exists at docs/devops/database.md
The system SHALL provide `docs/devops/database.md` documenting the 4-table schema (agents, credentials, audit_events, token_revocations), the migration runner, and exact commands to apply and verify migrations.

View File

@@ -42,3 +42,8 @@ terraform/
- [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only - [ ] PostgreSQL and Redis not publicly accessible — VPC-internal only
- [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP - [ ] `docs/devops/deployment.md` — end-to-end deployment walkthrough for AWS and GCP
- [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control - [ ] `terraform.tfvars.example` provided for both environments — no secrets in version control
## ADDED Requirements
### Requirement: Local development guide exists at docs/devops/local-development.md
The system SHALL provide `docs/devops/local-development.md` documenting the complete local setup using docker-compose for infrastructure and npm for the application server, including all service ports, health check verification, and the Dockerfile gap note.

View File

@@ -0,0 +1,56 @@
## ADDED Requirements
### Requirement: Developer guides index exists at docs/developers/guides/README.md
The system SHALL provide a guides index at `docs/developers/guides/README.md` listing all available guides with one-line descriptions and links.
#### Scenario: Developer finds the right guide quickly
- **WHEN** the developer opens the guides folder
- **THEN** they SHALL see a list of all guides with descriptions so they can choose the one relevant to their task
### Requirement: Agent registration guide exists at docs/developers/guides/register-an-agent.md
The system SHALL provide a step-by-step guide for registering an agent, including all required and optional fields, validation rules, and how to handle the response.
#### Scenario: Developer registers their first agent
- **WHEN** the developer follows the registration guide
- **THEN** they SHALL successfully create an agent and understand what `agentId`, `clientId`, and `status` mean in the response
#### Scenario: Developer understands registration validation errors
- **WHEN** the guide covers validation
- **THEN** it SHALL show examples of common validation errors (missing required fields, invalid email format) and how to fix them
### Requirement: Credential management guide exists at docs/developers/guides/manage-credentials.md
The system SHALL provide a guide covering all four credential operations: generate, list, rotate, and revoke — with curl examples and explanation of when to use each.
#### Scenario: Developer rotates a compromised credential
- **WHEN** the developer follows the rotation section
- **THEN** they SHALL understand that rotation replaces the secret while keeping the same `credentialId`, and the old secret is immediately invalid
#### Scenario: Developer understands credential revocation vs agent decommission
- **WHEN** the developer reads the guide
- **THEN** they SHALL understand the difference: revoking a credential leaves the agent active with other credentials; decommissioning the agent revokes everything permanently
### Requirement: Token guide exists at docs/developers/guides/issue-and-revoke-tokens.md
The system SHALL provide a guide covering token issuance, introspection, and revocation — explaining the JWT structure, expiry, and how to use the Bearer token in API requests.
#### Scenario: Developer uses a token to authenticate a request
- **WHEN** the developer follows the token guide
- **THEN** they SHALL see an example of using the issued token as a Bearer token in an Authorization header on a subsequent API call
#### Scenario: Developer introspects a token to check validity
- **WHEN** the developer reads the introspection section
- **THEN** they SHALL understand what `active: true/false` means and what fields are returned
#### Scenario: Developer revokes a token
- **WHEN** the developer follows the revocation section
- **THEN** they SHALL understand that revoked tokens are immediately invalid even if not yet expired
### Requirement: Audit log guide exists at docs/developers/guides/query-audit-logs.md
The system SHALL provide a guide for querying the audit log — covering available filters (agentId, action, outcome, date range), pagination, and how to interpret audit events.
#### Scenario: Developer queries audit events for a specific agent
- **WHEN** the developer follows the audit guide
- **THEN** they SHALL see a curl example filtering by `agentId` and understand the structure of each audit event
#### Scenario: Developer understands audit log retention
- **WHEN** the developer reads the guide
- **THEN** they SHALL understand that free-tier audit logs are retained for 90 days and what happens after that window

View File

@@ -0,0 +1,370 @@
# AGNTCY Federation — Specification
**Workstream**: 4 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Enable cross-instance agent identity federation using signed JWT assertions. Operators register trusted remote AgentIdP instances as federation partners. When an agent presents a token issued by a trusted partner instance, the local AgentIdP can verify it by fetching and caching the partner's JWKS. This enables multi-organization agent identity interoperability aligned with AGNTCY standards.
Federation is opt-in per organization. Only tokens from explicitly registered, trusted partners are accepted.
---
## API Endpoints
### POST /federation/trust
Register a new federation trust partner. Requires `admin:orgs` scope.
```yaml
POST /federation/trust
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, issuer, jwksUri]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Human-readable name for this federation partner
example: "Contoso AgentIdP"
issuer:
type: string
format: uri
description: OIDC issuer URL of the partner instance (must match iss claim in tokens)
example: "https://agentidp.contoso.com"
jwksUri:
type: string
format: uri
description: URL of the partner's JWKS endpoint
example: "https://agentidp.contoso.com/.well-known/jwks.json"
allowedOrganizations:
type: array
items:
type: string
description: Optional list of organization IDs in the partner instance whose tokens are accepted. Empty means all partner orgs are trusted.
example: ["org_contoso_engineering"]
expiresAt:
type: string
format: date-time
description: Optional expiry for this trust relationship. If omitted, trust does not expire automatically.
Responses:
201 Created:
schema:
$ref: '#/components/schemas/FederationPartner'
example:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
allowedOrganizations: []
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
duplicate_issuer:
code: "DUPLICATE_ISSUER"
message: "A trust relationship with this issuer already exists"
unreachable_jwks:
code: "JWKS_UNREACHABLE"
message: "Could not fetch JWKS from the provided jwksUri"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /federation/partners
List all registered federation partners for the caller's organization. Requires `admin:orgs` scope.
```yaml
GET /federation/partners
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, expired]
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/FederationPartner'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
jwksUri: "https://agentidp.contoso.com/.well-known/jwks.json"
status: "active"
trustedSince: "2026-03-29T12:00:00Z"
expiresAt: null
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /federation/partners/:partnerId
Remove a federation trust relationship. Requires `admin:orgs` scope.
```yaml
DELETE /federation/partners/{partnerId}
Authorization: Bearer <token with admin:orgs scope>
Path Parameters:
partnerId:
type: string
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /federation/verify
Verify a token issued by a federated partner AgentIdP instance. The caller presents the token; this endpoint resolves the issuer, fetches (or cache-hits) the partner's JWKS, and verifies the signature and claims.
```yaml
POST /federation/verify
Authorization: Bearer <local access_token with agents:read scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [token]
properties:
token:
type: string
description: The JWT token issued by the remote AgentIdP instance to verify
expectedIssuer:
type: string
format: uri
description: Optional — if provided, verification fails if token issuer does not match
expectedOrganizationId:
type: string
description: Optional — if provided, verification fails if token organization_id does not match
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
claims:
type: object
description: Decoded JWT claims from the verified token
properties:
sub:
type: string
iss:
type: string
iat:
type: integer
exp:
type: integer
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
did:
type: string
partner:
type: object
description: The federation partner record that vouches for this token
properties:
partnerId:
type: string
name:
type: string
issuer:
type: string
example:
valid: true
claims:
sub: "agt_contoso_abc123"
iss: "https://agentidp.contoso.com"
iat: 1743249600
exp: 1743253200
agent_id: "agt_contoso_abc123"
agent_type: "classifier"
organization_id: "org_contoso_engineering"
capabilities: ["text-classification"]
did: "did:web:agentidp.contoso.com:agents:agt_contoso_abc123"
partner:
partnerId: "fed_01HXK7Z9P3FKWABCDEF33333"
name: "Contoso AgentIdP"
issuer: "https://agentidp.contoso.com"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized (local token invalid):
schema:
$ref: '#/components/schemas/ErrorResponse'
422 Unprocessable Entity (token invalid or untrusted issuer):
schema:
type: object
properties:
valid:
type: boolean
example: false
reason:
type: string
enum:
- TOKEN_EXPIRED
- INVALID_SIGNATURE
- UNTRUSTED_ISSUER
- JWKS_FETCH_FAILED
- ORGANIZATION_NOT_ALLOWED
message:
type: string
example:
valid: false
reason: "UNTRUSTED_ISSUER"
message: "No trust relationship registered for issuer https://unknown.example.com"
```
---
## Database Schema Changes
### New Table: federation_partners
```sql
CREATE TABLE federation_partners (
partner_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
name VARCHAR(100) NOT NULL,
issuer VARCHAR(255) NOT NULL,
jwks_uri VARCHAR(255) NOT NULL,
allowed_organizations JSONB NOT NULL DEFAULT '[]',
status VARCHAR(20) NOT NULL DEFAULT 'active',
trusted_since TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
last_jwks_fetch TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT federation_partners_status_check CHECK (status IN ('active', 'suspended', 'expired')),
UNIQUE (organization_id, issuer)
);
CREATE INDEX idx_federation_partners_org_id ON federation_partners(organization_id);
CREATE INDEX idx_federation_partners_issuer ON federation_partners(issuer);
CREATE INDEX idx_federation_partners_status ON federation_partners(status);
```
### Redis: JWKS Cache
Partner JWKS documents are cached in Redis with a TTL:
```
Key: federation:jwks:<issuer_url_sha256>
Value: JSON string of the JWKS document
TTL: 1 hour (configurable via FEDERATION_JWKS_CACHE_TTL_SECONDS)
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `FEDERATION_ENABLED` | Enable federation endpoints | `true` |
| `FEDERATION_JWKS_CACHE_TTL_SECONDS` | Redis TTL for cached partner JWKS | `3600` |
| `FEDERATION_JWKS_FETCH_TIMEOUT_MS` | HTTP timeout for fetching partner JWKS | `5000` |
| `FEDERATION_MAX_PARTNERS_PER_ORG` | Max federation partners per organization | `50` |
---
## Dependencies
No new npm packages. Federation uses `jsonwebtoken` (already present) for JWT verification and the existing HTTP client for JWKS fetches.
---
## Security Considerations
- Only tokens from explicitly registered, active federation partners are accepted in `POST /federation/verify`
- JWKS are cached to prevent JWKS endpoint hammering; cache is invalidated when a partner is updated
- Token signature verification uses the partner's JWKS; `alg: none` is always rejected
- `allowedOrganizations` field enables fine-grained trust: a partner can be trusted but only for tokens from specific organizations within that partner
- Expired federation partners (`expiresAt` in the past) are automatically treated as status `expired` — their tokens are rejected
- `POST /federation/verify` does not grant any local permissions — it is a verification-only endpoint. Callers must make their own access control decisions based on the returned claims.
- Clock skew tolerance: `exp` claim verification allows 30 seconds of clock skew (standard JWT practice)
---
## Acceptance Criteria
- [ ] `POST /federation/trust` registers a partner and fetches JWKS; returns 400 if JWKS unreachable
- [ ] `POST /federation/verify` returns `valid: true` for a correctly signed token from a trusted partner
- [ ] `POST /federation/verify` returns `valid: false` with `reason: UNTRUSTED_ISSUER` for unknown issuers
- [ ] `POST /federation/verify` returns `valid: false` with `reason: TOKEN_EXPIRED` for expired tokens
- [ ] Expired trust relationships (past `expiresAt`) are rejected automatically
- [ ] JWKS cache hit is used on second verification request for same issuer (Redis key present)
- [ ] TypeScript strict, zero `any`, >80% test coverage on FederationService

View File

@@ -0,0 +1,444 @@
# Multi-Tenancy — Specification
**Workstream**: 1 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Introduce an Organization model so a single AgentIdP instance serves multiple isolated organizations. Each organization has its own namespace of agents, credentials, audit events, and rate limits. Row-level tenancy in PostgreSQL is enforced by both application-layer `organization_id` filtering and PostgreSQL Row-Level Security (RLS) policies.
All existing endpoints that operate on agents, credentials, or audit events are augmented to be organization-scoped. A new Admin API provides organization lifecycle management. Organization membership controls which agents a caller can manage.
---
## API Endpoints
### POST /organizations
Create a new organization. Requires system-admin scope (`admin:orgs`).
```yaml
POST /organizations
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [name, slug]
properties:
name:
type: string
minLength: 2
maxLength: 100
description: Display name of the organization
example: "Acme AI Platform"
slug:
type: string
minLength: 2
maxLength: 50
pattern: "^[a-z0-9-]+$"
description: URL-safe unique identifier
example: "acme-ai"
planTier:
type: string
enum: [free, pro, enterprise]
default: free
maxAgents:
type: integer
minimum: 1
default: 100
maxTokensPerMonth:
type: integer
minimum: 1
default: 10000
Responses:
201 Created:
schema:
$ref: '#/components/schemas/Organization'
example:
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
maxAgents: 100
maxTokensPerMonth: 10000
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "VALIDATION_ERROR"
message: "slug must be unique"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "INSUFFICIENT_SCOPE"
message: "admin:orgs scope required"
```
---
### GET /organizations
List all organizations. Requires `admin:orgs` scope.
```yaml
GET /organizations
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
status:
type: string
enum: [active, suspended, deleted]
page:
type: integer
minimum: 1
default: 1
limit:
type: integer
minimum: 1
maximum: 100
default: 20
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Organization'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
name: "Acme AI Platform"
slug: "acme-ai"
planTier: "free"
status: "active"
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 20
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /organizations/:orgId
Get a single organization. Requires `admin:orgs` scope or membership in the organization.
```yaml
GET /organizations/{orgId}
Authorization: Bearer <token>
Path Parameters:
orgId:
type: string
description: Organization ID (org_... prefix)
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_NOT_FOUND"
message: "Organization not found"
```
---
### PATCH /organizations/:orgId
Partially update an organization. Requires `admin:orgs` scope.
```yaml
PATCH /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
name:
type: string
minLength: 2
maxLength: 100
planTier:
type: string
enum: [free, pro, enterprise]
maxAgents:
type: integer
minimum: 1
maxTokensPerMonth:
type: integer
minimum: 1
status:
type: string
enum: [active, suspended]
Responses:
200 OK:
schema:
$ref: '#/components/schemas/Organization'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /organizations/:orgId
Soft-delete an organization (sets status to `deleted`). Requires `admin:orgs` scope. Hard deletion is not supported — data is retained for compliance.
```yaml
DELETE /organizations/{orgId}
Authorization: Bearer <token with admin:orgs scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ORG_HAS_ACTIVE_AGENTS"
message: "Organization has active agents; decommission all agents before deleting"
```
---
### POST /organizations/:orgId/members
Add a member (agent credential) to an organization. Requires `admin:orgs` scope.
```yaml
POST /organizations/{orgId}/members
Authorization: Bearer <token with admin:orgs scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [agentId, role]
properties:
agentId:
type: string
description: ID of an already-registered agent to add as a member
role:
type: string
enum: [member, admin]
description: Role within the organization
Responses:
201 Created:
schema:
$ref: '#/components/schemas/OrgMember'
example:
memberId: "mem_01HXK7Z9P3FKWABCDEF99999"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
role: "member"
joinedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
409 Conflict:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "ALREADY_MEMBER"
message: "Agent is already a member of this organization"
```
---
### Modified: All /agents, /audit endpoints
All existing agent, credential, and audit endpoints now operate within the caller's organization context (extracted from `organization_id` claim in JWT). No URL changes — the scoping is transparent to callers already using the API.
---
## Database Schema Changes
### New Table: organizations
```sql
CREATE TABLE organizations (
organization_id VARCHAR(40) PRIMARY KEY, -- org_... prefixed ULID
name VARCHAR(100) NOT NULL,
slug VARCHAR(50) NOT NULL UNIQUE,
plan_tier VARCHAR(20) NOT NULL DEFAULT 'free',
max_agents INTEGER NOT NULL DEFAULT 100,
max_tokens_per_month INTEGER NOT NULL DEFAULT 10000,
status VARCHAR(20) NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organizations_status_check CHECK (status IN ('active', 'suspended', 'deleted')),
CONSTRAINT organizations_plan_check CHECK (plan_tier IN ('free', 'pro', 'enterprise'))
);
CREATE INDEX idx_organizations_slug ON organizations(slug);
CREATE INDEX idx_organizations_status ON organizations(status);
```
### New Table: organization_members
```sql
CREATE TABLE organization_members (
member_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
agent_id VARCHAR(40) NOT NULL REFERENCES agents(agent_id),
role VARCHAR(20) NOT NULL DEFAULT 'member',
joined_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT organization_members_role_check CHECK (role IN ('member', 'admin')),
UNIQUE (organization_id, agent_id)
);
CREATE INDEX idx_org_members_org_id ON organization_members(organization_id);
CREATE INDEX idx_org_members_agent_id ON organization_members(agent_id);
```
### Modified: agents table
```sql
ALTER TABLE agents
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_agents_organization_id ON agents(organization_id);
-- RLS
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
CREATE POLICY agents_org_isolation ON agents
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: credentials table
```sql
ALTER TABLE credentials
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_credentials_organization_id ON credentials(organization_id);
ALTER TABLE credentials ENABLE ROW LEVEL SECURITY;
CREATE POLICY credentials_org_isolation ON credentials
USING (organization_id = current_setting('app.organization_id', true));
```
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id) DEFAULT 'org_system';
CREATE INDEX idx_audit_logs_organization_id ON audit_logs(organization_id);
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_logs_org_isolation ON audit_logs
USING (organization_id = current_setting('app.organization_id', true));
```
### Seed: Default system organization
```sql
INSERT INTO organizations (organization_id, name, slug, plan_tier, max_agents, max_tokens_per_month, status)
VALUES ('org_system', 'System', 'system', 'enterprise', 999999, 999999999, 'active');
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MULTI_TENANCY_ENABLED` | Enable organization enforcement (set false for single-tenant mode) | `true` |
| `DEFAULT_ORG_ID` | Organization ID to assign pre-tenancy data during migration | `org_system` |
| `MAX_ORGS_PER_INSTANCE` | Hard cap on number of organizations per instance | `1000` |
---
## Dependencies
No new npm packages. Row-level tenancy uses existing PostgreSQL client (`pg`) and query patterns.
---
## Security Considerations
- PostgreSQL RLS is enabled as defense-in-depth — even accidental omission of `organization_id` filter at application layer is caught by the database
- `SET LOCAL app.organization_id` is called at the start of every database transaction
- The `admin:orgs` scope is a new privileged scope — only system-level agent credentials carry it
- Organization slugs are public-facing but organization IDs are internal — never expose organization IDs in public URLs where avoidable
- `DELETE /organizations` is soft-delete only — hard deletion requires a separate admin runbook to prevent accidental data loss
---
## Acceptance Criteria
- [ ] Single AgentIdP instance can serve 2+ organizations with zero cross-organization data leakage
- [ ] All agent/credential/audit operations are scoped to caller's organization_id from JWT
- [ ] PostgreSQL RLS policies verified: direct DB query without app.organization_id setting returns 0 rows
- [ ] Organization CRUD endpoints return correct 403 when caller lacks admin:orgs scope
- [ ] Pre-existing agents assigned to default system organization without data loss
- [ ] TypeScript strict, zero `any`, >80% test coverage on OrgService

366
openspec/specs/oidc/spec.md Normal file
View File

@@ -0,0 +1,366 @@
# OpenID Connect (OIDC) — Specification
**Workstream**: 3 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Add a full OIDC 1.0 layer on top of the existing OAuth 2.0 `client_credentials` implementation using the certified `oidc-provider` npm library. The OIDC layer exposes Discovery, JWKS, extends the token endpoint to return ID tokens with agent claims, and provides an `/agent-info` endpoint (the agent-identity equivalent of OIDC's `/userinfo`).
The existing `POST /oauth2/token` endpoint is extended, not replaced. Callers that do not request the `openid` scope continue to receive standard OAuth 2.0 responses unchanged.
---
## API Endpoints
### GET /.well-known/openid-configuration
OIDC Discovery document. No authentication required. This is the standard OIDC Discovery endpoint (RFC 8414 / OpenID Connect Discovery 1.0).
```yaml
GET /.well-known/openid-configuration
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: OIDC Discovery document per OpenID Connect Discovery 1.0
example:
issuer: "https://idp.sentryagent.ai"
authorization_endpoint: "https://idp.sentryagent.ai/oauth2/authorize"
token_endpoint: "https://idp.sentryagent.ai/oauth2/token"
jwks_uri: "https://idp.sentryagent.ai/.well-known/jwks.json"
userinfo_endpoint: "https://idp.sentryagent.ai/agent-info"
introspection_endpoint: "https://idp.sentryagent.ai/oauth2/introspect"
revocation_endpoint: "https://idp.sentryagent.ai/oauth2/revoke"
response_types_supported:
- "token"
grant_types_supported:
- "client_credentials"
subject_types_supported:
- "public"
id_token_signing_alg_values_supported:
- "RS256"
- "ES256"
scopes_supported:
- "openid"
- "agents:read"
- "agents:write"
- "tokens:read"
- "audit:read"
claims_supported:
- "sub"
- "iss"
- "iat"
- "exp"
- "agent_id"
- "agent_type"
- "organization_id"
- "capabilities"
- "deployment_env"
- "owner"
token_endpoint_auth_methods_supported:
- "client_secret_post"
- "client_secret_basic"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /.well-known/jwks.json
JSON Web Key Set. Contains the public keys used to sign ID tokens and access tokens. No authentication required. Clients use this endpoint to verify token signatures.
```yaml
GET /.well-known/jwks.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
Cache-Control: public, max-age=3600
schema:
type: object
required: [keys]
properties:
keys:
type: array
items:
type: object
description: JSON Web Key (RFC 7517)
properties:
kty:
type: string
example: "RSA"
use:
type: string
example: "sig"
kid:
type: string
description: Key ID — matches `kid` header in issued JWTs
alg:
type: string
example: "RS256"
n:
type: string
description: RSA modulus (base64url)
e:
type: string
description: RSA exponent (base64url)
example:
keys:
- kty: "RSA"
use: "sig"
kid: "key-2026-03-29-01"
alg: "RS256"
n: "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx4cbbfAAt..."
e: "AQAB"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### POST /oauth2/token (extended)
The existing token endpoint is extended to return an `id_token` when the `openid` scope is requested. All existing behavior is preserved when `openid` is not in the scope list.
```yaml
POST /oauth2/token
Content-Type: application/x-www-form-urlencoded
Request Body:
schema:
type: object
required: [grant_type, client_id, client_secret]
properties:
grant_type:
type: string
enum: [client_credentials]
client_id:
type: string
client_secret:
type: string
scope:
type: string
description: Space-separated scopes. Include "openid" to receive an id_token.
example: "openid agents:read"
Responses:
200 OK (with openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
example: "Bearer"
expires_in:
type: integer
scope:
type: string
id_token:
type: string
description: Signed JWT ID token containing agent identity claims. Only present when openid scope was requested.
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "openid agents:read"
id_token: "eyJhbGciOiJSUzI1NiIsImtpZCI6ImtleS0yMDI2LTAzLTI5LTAxIn0..."
200 OK (without openid scope):
schema:
type: object
properties:
access_token:
type: string
token_type:
type: string
expires_in:
type: integer
scope:
type: string
example:
access_token: "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
token_type: "Bearer"
expires_in: 3600
scope: "agents:read"
400 Bad Request:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
example:
error: "invalid_client"
error_description: "Invalid client credentials"
401 Unauthorized:
schema:
$ref: '#/components/schemas/OAuthErrorResponse'
```
#### ID Token Claims
When `openid` scope is requested, the ID token (a signed JWT) contains the following claims:
```json
{
"iss": "https://idp.sentryagent.ai",
"sub": "agt_01HXK7Z9P3FKWABCDEF67890",
"aud": "agt_01HXK7Z9P3FKWABCDEF67890",
"iat": 1743249600,
"exp": 1743253200,
"agent_id": "agt_01HXK7Z9P3FKWABCDEF67890",
"agent_type": "orchestrator",
"organization_id": "org_01HXK7Z9P3FKWABCDEF12345",
"capabilities": ["task-planning", "tool-use"],
"deployment_env": "production",
"owner": "acme-ai",
"did": "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
}
```
---
### GET /agent-info
Returns claims about the authenticated agent identity. This is the agent-first equivalent of the OIDC `/userinfo` endpoint. Authentication required with any valid access token.
```yaml
GET /agent-info
Authorization: Bearer <access_token>
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: Agent identity claims (subset of registered agent data)
properties:
sub:
type: string
description: Subject — agentId
agent_id:
type: string
agent_type:
type: string
organization_id:
type: string
capabilities:
type: array
items:
type: string
deployment_env:
type: string
owner:
type: string
version:
type: string
status:
type: string
did:
type: string
description: W3C DID for this agent (if DID workstream is active)
created_at:
type: string
format: date-time
example:
sub: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_id: "agt_01HXK7Z9P3FKWABCDEF67890"
agent_type: "orchestrator"
organization_id: "org_01HXK7Z9P3FKWABCDEF12345"
capabilities: ["task-planning", "tool-use"]
deployment_env: "production"
owner: "acme-ai"
version: "1.2.0"
status: "active"
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
created_at: "2026-03-29T12:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "UNAUTHORIZED"
message: "Invalid or expired access token"
```
---
## Database Schema Changes
### New Table: oidc_keys
Stores the RSA/EC key pairs used for ID token signing. Private keys stored in Vault; public key JWK in PostgreSQL for JWKS endpoint.
```sql
CREATE TABLE oidc_keys (
key_id VARCHAR(40) PRIMARY KEY,
kid VARCHAR(100) NOT NULL UNIQUE, -- Key ID in JWKS
algorithm VARCHAR(10) NOT NULL,
use_purpose VARCHAR(10) NOT NULL DEFAULT 'sig',
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
retired_at TIMESTAMPTZ,
CONSTRAINT oidc_keys_alg_check CHECK (algorithm IN ('RS256', 'ES256')),
CONSTRAINT oidc_keys_use_check CHECK (use_purpose IN ('sig', 'enc'))
);
CREATE INDEX idx_oidc_keys_is_current ON oidc_keys(is_current) WHERE is_current = TRUE;
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `OIDC_ISSUER` | OIDC issuer URL (must match token `iss` claim) | `https://${HOST}` |
| `OIDC_ID_TOKEN_TTL_SECONDS` | ID token lifetime | `3600` |
| `OIDC_SIGNING_ALG` | ID token signing algorithm | `RS256` |
| `OIDC_JWKS_CACHE_TTL_SECONDS` | JWKS response cache TTL | `3600` |
| `OIDC_KEY_ROTATION_DAYS` | Days between signing key rotations | `90` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `oidc-provider` | `^8.4.6` | Certified OIDC server library (OpenID Foundation conformant) |
---
## Security Considerations
- ID token signing keys are stored in Vault; public keys only are served via JWKS
- JWKS endpoint is cached in Redis (`OIDC_JWKS_CACHE_TTL_SECONDS`) to prevent key-hammering
- Key rotation: when a new signing key is created, the old key remains in JWKS until all tokens signed with it have expired
- The `openid` scope is only issued to callers explicitly requesting it — not included by default
- `GET /agent-info` returns the same data as the ID token — no additional sensitive data
- ID tokens for agent credentials must not contain client secrets or internal system paths
- `alg: none` is explicitly rejected — all ID tokens must be signed
---
## Acceptance Criteria
- [ ] `/.well-known/openid-configuration` passes OIDC Discovery conformance validation
- [ ] `/.well-known/jwks.json` returns valid JWKS with current signing public key
- [ ] ID token returned when `openid` scope is in token request; not returned otherwise
- [ ] ID token is verifiable against JWKS endpoint using standard JWT libraries
- [ ] ID token claims match agent record (agent_type, capabilities, organization_id, did)
- [ ] `/agent-info` returns correct claims for authenticated agent
- [ ] Key rotation: old JWKS key is kept until all signed tokens expire
- [ ] TypeScript strict, zero `any`, >80% test coverage on OIDCService

View File

@@ -0,0 +1,7 @@
## ADDED Requirements
### Requirement: Security guide exists at docs/devops/security.md
The system SHALL provide `docs/devops/security.md` documenting RSA keypair generation, key rotation procedure, CORS configuration, and secret storage guidance.
### Requirement: Operations runbook exists at docs/devops/operations.md
The system SHALL provide `docs/devops/operations.md` covering startup procedure, graceful shutdown (SIGTERM/SIGINT), log interpretation, and troubleshooting for the most common operational failures.

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Quick-start guide exists at docs/developers/quick-start.md
The system SHALL provide a quick-start guide at `docs/developers/quick-start.md` that enables a bedroom developer to register their first agent and issue an OAuth 2.0 access token in under 5 minutes.
#### Scenario: Developer completes quick-start from zero
- **WHEN** a developer with no prior AgentIdP knowledge follows the quick-start guide
- **THEN** they SHALL have a registered agent, a valid credential, and a working access token by the end
### Requirement: Quick-start lists exact prerequisites
The quick-start guide SHALL list all prerequisites at the top before any steps, so the developer knows what they need before starting.
#### Scenario: Prerequisites are minimal and explicit
- **WHEN** the developer reads the prerequisites section
- **THEN** they SHALL see exactly: Docker (for running PostgreSQL and Redis) and curl (for API calls) — nothing else required
### Requirement: Quick-start provides a working docker-compose startup command
The quick-start guide SHALL include a single command to start the required infrastructure (PostgreSQL + Redis) using the project's `docker-compose.yml`.
#### Scenario: Developer starts infrastructure
- **WHEN** the developer runs the provided docker-compose command
- **THEN** the guide SHALL confirm what services are started and what ports they run on
### Requirement: Quick-start covers the full 4-step workflow
The quick-start guide SHALL cover exactly these four steps in order, each with a working curl command and the expected response:
1. Start the AgentIdP server
2. Register an agent (`POST /agents`)
3. Generate a credential (`POST /agents/{agentId}/credentials`)
4. Issue an access token (`POST /token`)
#### Scenario: Each step has a copy-pasteable curl command
- **WHEN** the developer reads any step
- **THEN** they SHALL find a complete curl command with real placeholder values they can substitute
#### Scenario: Each step shows the expected JSON response
- **WHEN** the developer runs a curl command from the guide
- **THEN** the guide SHALL show them what a successful response looks like so they can verify their output
### Requirement: Quick-start ends with a next-steps section
The quick-start guide SHALL end with a "What's Next" section linking to: core-concepts.md, developer-guides.md, and api-reference.md.
#### Scenario: Developer knows where to go after quick-start
- **WHEN** the developer reaches the end of the quick-start
- **THEN** they SHALL see at least 3 links to deeper documentation

335
openspec/specs/soc2/spec.md Normal file
View File

@@ -0,0 +1,335 @@
# SOC 2 Type II Preparation — Specification
**Workstream**: 6 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Implement the technical controls required for SOC 2 Type II audit readiness. SOC 2 Type II certifies that security controls operate continuously over a defined period — not just that they exist. Controls are implemented in code, not just documented.
This workstream cuts across all other Phase 3 workstreams. It delivers: encryption at rest for sensitive columns, TLS enforcement middleware, automated secrets rotation, security event alerting, and audit log immutability via a Merkle hash chain. A compliance documentation package (controls matrix and runbook) is produced for auditors.
---
## Technical Controls
### Control C1: Encryption at Rest (Column-Level Encryption)
Sensitive columns in PostgreSQL are encrypted using `pgcrypto` symmetric encryption. The encryption key is stored in Vault and fetched at application startup, never written to disk.
**Columns encrypted**:
- `credentials.secret_hash` — encrypted with AES-256-CBC
- `credentials.vault_path` — encrypted with AES-256-CBC
- `webhook_subscriptions.vault_secret_path` — encrypted with AES-256-CBC
- `agent_did_keys.vault_key_path` — encrypted with AES-256-CBC
**Implementation**: A `EncryptionService` wraps `pgcrypto` `pgp_sym_encrypt` / `pgp_sym_decrypt`. The key is a 256-bit symmetric key stored at `secret/agentidp/encryption/column-key` in Vault. All INSERT/SELECT operations for encrypted columns go through `EncryptionService`.
---
### Control C2: TLS Enforcement
All inbound HTTP connections are rejected in production if TLS is not present. This is enforced at two levels:
1. Express middleware: `TLSEnforcementMiddleware` — if `X-Forwarded-Proto` is not `https` and `NODE_ENV=production`, respond `301 Moved Permanently` to HTTPS.
2. Terraform: Load balancers (Phase 2 Terraform modules) already enforce TLS; TLS enforcement middleware provides defense-in-depth.
---
### Control C3: Automated Secrets Rotation
A scheduled job (`SecretsRotationJob`) runs on a configurable cron schedule. It:
1. Identifies credentials whose `expires_at` is within `ROTATION_WARNING_DAYS` days
2. Emits a Prometheus metric `agentidp_credentials_expiring_soon_total` (labelled by `org_id`, `days_remaining`)
3. Renews Vault leases for all active credentials
4. Sends a webhook event `credential.expiring_soon` to subscribers who have opted in
This does not automatically rotate credentials without operator action — it alerts and prepares. Forced rotation requires an operator call to the existing `POST /agents/:id/credentials/:credId/rotate` endpoint.
---
### Control C4: Audit Log Immutability (Merkle Hash Chain)
Every `audit_logs` row carries two new columns:
- `hash`: SHA-256 of `(eventId || timestamp.toISOString() || action || outcome || agentId || organizationId || previousHash)`
- `previous_hash`: hash of the immediately preceding `audit_logs` row (by `created_at` order), or the genesis string `"GENESIS"` for the first row
A PostgreSQL trigger prevents `UPDATE` and `DELETE` on `audit_logs`.
A new admin endpoint `GET /audit/verify` runs a sequential chain verification pass and returns the integrity status.
---
### Control C5: Security Event Alerting
Prometheus alerting rules are written for the following security events:
| Alert | Condition | Severity |
|-------|-----------|---------|
| `AuthFailureSpike` | >50 `auth.failed` events in 5 minutes | Warning |
| `RateLimitExhaustion` | >80% of org rate limit consumed in 1 minute | Warning |
| `AnomalousTokenIssuance` | Token issuance rate 3x 7-day average | Warning |
| `WebhookDeadLetterAccumulating` | `agentidp_webhook_dead_letters_total` increases by >10 in 1 hour | Warning |
| `AuditChainIntegrityFailed` | `agentidp_audit_chain_integrity` metric is 0 | Critical |
| `CredentialExpiryApproaching` | `agentidp_credentials_expiring_soon_total{days_remaining="7"}` > 0 | Info |
---
## API Endpoints
### GET /audit/verify
Verify the Merkle hash chain integrity of the audit log. Requires `admin:orgs` scope. This is a potentially expensive operation on large audit logs — it is rate-limited to once per 5 minutes per organization.
```yaml
GET /audit/verify
Authorization: Bearer <token with admin:orgs scope>
Query Parameters:
fromDate:
type: string
format: date-time
description: Start of verification range. If omitted, verifies from genesis.
toDate:
type: string
format: date-time
description: End of verification range. If omitted, verifies to the latest row.
Responses:
200 OK:
schema:
type: object
properties:
valid:
type: boolean
description: True if the chain is intact across the entire range
rowsVerified:
type: integer
description: Number of audit rows verified
firstEventId:
type: string
lastEventId:
type: string
firstTimestamp:
type: string
format: date-time
lastTimestamp:
type: string
format: date-time
verifiedAt:
type: string
format: date-time
brokenAtEventId:
type: string
nullable: true
description: Present only if valid=false — the first eventId where the chain breaks
example:
valid: true
rowsVerified: 15420
firstEventId: "evt_genesis_00001"
lastEventId: "evt_01HXK7Z9P3FKWABCDEFZZZZZ"
firstTimestamp: "2026-01-01T00:00:00Z"
lastTimestamp: "2026-03-29T12:00:00Z"
verifiedAt: "2026-03-29T14:00:00Z"
brokenAtEventId: null
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
429 Too Many Requests:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "RATE_LIMITED"
message: "Audit verification can be run at most once per 5 minutes"
```
---
### GET /compliance/controls
Returns the current status of all SOC 2 technical controls. Requires `admin:orgs` scope. Used by auditors and compliance dashboards.
```yaml
GET /compliance/controls
Authorization: Bearer <token with admin:orgs scope>
Responses:
200 OK:
schema:
type: object
properties:
generatedAt:
type: string
format: date-time
controls:
type: array
items:
type: object
properties:
controlId:
type: string
name:
type: string
status:
type: string
enum: [pass, fail, warning, not_applicable]
description:
type: string
lastChecked:
type: string
format: date-time
example:
generatedAt: "2026-03-29T14:00:00Z"
controls:
- controlId: "C1"
name: "Encryption at Rest"
status: "pass"
description: "Column-level encryption active for all sensitive columns"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C2"
name: "TLS Enforcement"
status: "pass"
description: "All non-TLS requests redirected to HTTPS in production"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C3"
name: "Secrets Rotation"
status: "warning"
description: "3 credentials expiring within 7 days"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C4"
name: "Audit Log Immutability"
status: "pass"
description: "Merkle chain intact — last verified 2026-03-29T13:55:00Z"
lastChecked: "2026-03-29T14:00:00Z"
- controlId: "C5"
name: "Security Event Alerting"
status: "pass"
description: "All 6 alerting rules active in Prometheus"
lastChecked: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### Modified: audit_logs table
```sql
ALTER TABLE audit_logs
ADD COLUMN hash VARCHAR(64), -- SHA-256 hex string of chain node
ADD COLUMN previous_hash VARCHAR(64); -- Hash of preceding row, or "GENESIS"
-- Back-fill genesis hash for existing rows (one-time migration)
-- Migration script computes chain in order of created_at
-- Prevent updates and deletes (immutability trigger)
CREATE OR REPLACE FUNCTION prevent_audit_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'audit_logs rows are immutable — modification is not permitted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_logs_immutability
BEFORE UPDATE OR DELETE ON audit_logs
FOR EACH ROW EXECUTE FUNCTION prevent_audit_modification();
```
### Modified: credentials table
```sql
-- Columns remain same type; application now stores encrypted values
-- No DDL change — encryption is transparent at application layer
-- Add comment for documentation
COMMENT ON COLUMN credentials.secret_hash IS 'AES-256-CBC encrypted via EncryptionService (pgcrypto). Not a plain bcrypt hash.';
COMMENT ON COLUMN credentials.vault_path IS 'AES-256-CBC encrypted via EncryptionService.';
```
### New Table: compliance_check_log
```sql
CREATE TABLE compliance_check_log (
check_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
control_id VARCHAR(10) NOT NULL,
status VARCHAR(20) NOT NULL,
details JSONB NOT NULL DEFAULT '{}',
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_compliance_check_org ON compliance_check_log(organization_id, checked_at DESC);
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `SOC2_CONTROLS_ENABLED` | Enable SOC 2 controls enforcement | `true` |
| `TLS_ENFORCEMENT_ENABLED` | Enforce HTTPS in production | `true` in production, `false` in development |
| `COLUMN_ENCRYPTION_KEY_PATH` | Vault path for AES-256 column encryption key | `secret/agentidp/encryption/column-key` |
| `ROTATION_WARNING_DAYS` | Days before expiry to emit rotation warning | `30` |
| `SECRETS_ROTATION_CRON` | Cron schedule for rotation check job | `0 3 * * *` (daily at 3 AM UTC) |
| `AUDIT_CHAIN_VERIFY_CRON` | Cron schedule for automated chain verification | `0 2 * * *` (daily at 2 AM UTC) |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `node-forge` | `^1.3.1` | AES-256-CBC column-level encryption primitives |
Note: `pgcrypto` PostgreSQL extension must be enabled: `CREATE EXTENSION IF NOT EXISTS pgcrypto;`
---
## Compliance Documentation
The following documents are produced as part of this workstream:
| Document | Path | Description |
|----------|------|-------------|
| Controls Matrix | `docs/compliance/soc2-controls-matrix.md` | Maps SOC 2 Trust Services Criteria to implemented controls |
| Encryption Runbook | `docs/compliance/encryption-runbook.md` | Key rotation procedure, Vault key path map |
| Audit Log Runbook | `docs/compliance/audit-log-runbook.md` | How to run chain verification, interpret results |
| Incident Response | `docs/compliance/incident-response.md` | Security event response procedures |
| Secrets Rotation Guide | `docs/compliance/secrets-rotation.md` | Operator guide for credential and key rotation |
---
## Security Considerations
- Column encryption key is fetched from Vault at startup and held in process memory — never written to disk or logged
- Key rotation: new encryption key generates re-encrypted copies of all sensitive columns in a migration; the old key is retained in Vault history
- The immutability trigger on `audit_logs` prevents application-layer modification; a `SUPERUSER` can still bypass triggers — document this in the controls matrix as a residual risk requiring compensating controls (e.g., read-only replica verification)
- `GET /audit/verify` is rate-limited to prevent denial-of-service via repeated expensive sequential scans
- `GET /compliance/controls` never returns raw secrets or key material — only control status
---
## Acceptance Criteria
- [ ] `pgcrypto` extension enabled; sensitive columns are encrypted at rest (verified: plaintext not visible in direct DB query)
- [ ] TLS enforcement middleware redirects HTTP to HTTPS in production; passthrough in development
- [ ] `SecretsRotationJob` runs on schedule; emits Prometheus metric for expiring credentials
- [ ] Audit log immutability trigger prevents UPDATE/DELETE on `audit_logs` table
- [ ] `GET /audit/verify` returns `valid: true` for an unmodified chain
- [ ] `GET /audit/verify` returns `valid: false` with `brokenAtEventId` after a row is manually tampered with (test scenario)
- [ ] All 6 Prometheus alerting rules are present in `monitoring/prometheus/alerts.yml`
- [ ] `GET /compliance/controls` returns correct status for all 5 controls
- [ ] Compliance documentation written and reviewed
- [ ] TypeScript strict, zero `any`, >80% test coverage on SOC2 control implementations

View File

@@ -0,0 +1,10 @@
## ADDED Requirements
### Requirement: System overview exists at docs/devops/README.md
The system SHALL provide a `docs/devops/README.md` that serves as the entry point for DevOps engineers, including an index of all DevOps docs and a brief system overview.
### Requirement: Architecture doc exists at docs/devops/architecture.md
The system SHALL provide `docs/devops/architecture.md` documenting all components (Express server, PostgreSQL, Redis), their roles, ports, and data flow.
### Requirement: Environment variable reference exists at docs/devops/environment-variables.md
The system SHALL provide `docs/devops/environment-variables.md` documenting every environment variable with name, type, required/optional, default, and example value.

View File

@@ -0,0 +1,353 @@
# W3C Decentralized Identifiers (DIDs) — Specification
**Workstream**: 2 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Issue a W3C `did:web` identifier for every registered agent and serve DID Documents over HTTPS. The AgentIdP instance itself has a root DID Document at `/.well-known/did.json`. Each agent has an individual DID Document at `/agents/:id/did`. A DID resolution endpoint wraps the standard resolution workflow. Agent cards in AGNTCY format are derivable from DID Documents.
The `did:web` method resolves to `https://<host>/.well-known/did.json` (instance) and `https://<host>/agents/<agentId>/did` (per-agent). All DID Documents are W3C DID Core 1.0 compliant.
---
## API Endpoints
### GET /.well-known/did.json
Root DID Document for the AgentIdP instance. No authentication required — this is a public discovery endpoint.
```yaml
GET /.well-known/did.json
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant DID Document
required: [id, "@context", verificationMethod, authentication]
properties:
"@context":
type: array
items:
type: string
example:
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/security/suites/jws-2020/v1"
id:
type: string
description: DID for this AgentIdP instance
example: "did:web:idp.sentryagent.ai"
controller:
type: string
example: "did:web:idp.sentryagent.ai"
verificationMethod:
type: array
items:
$ref: '#/components/schemas/VerificationMethod'
authentication:
type: array
items:
type: string
description: References to verification methods for authentication
assertionMethod:
type: array
items:
type: string
service:
type: array
items:
$ref: '#/components/schemas/DIDService'
example:
"@context":
- "https://www.w3.org/ns/did/v1"
id: "did:web:idp.sentryagent.ai"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU"
y: "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
authentication:
- "did:web:idp.sentryagent.ai#key-1"
service:
- id: "did:web:idp.sentryagent.ai#agent-registry"
type: "AgentIdentityProvider"
serviceEndpoint: "https://idp.sentryagent.ai"
500 Internal Server Error:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did
Per-agent DID Document. No authentication required — DID Documents are public.
```yaml
GET /agents/{agentId}/did
No authentication required
Path Parameters:
agentId:
type: string
description: Agent ID
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: W3C DID Core 1.0 compliant per-agent DID Document
example:
"@context":
- "https://www.w3.org/ns/did/v1"
- "https://w3id.org/agntcy/v1"
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
controller: "did:web:idp.sentryagent.ai"
verificationMethod:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
type: "JsonWebKey2020"
controller: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
publicKeyJwk:
kty: "EC"
crv: "P-256"
x: "abc123"
y: "def456"
authentication:
- "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#key-1"
service:
- id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890#agent-card"
type: "AgentCard"
serviceEndpoint: "https://idp.sentryagent.ai/agents/agt_01HXK7Z9P3FKWABCDEF67890/did/card"
agntcy:
agentId: "agt_01HXK7Z9P3FKWABCDEF67890"
agentType: "orchestrator"
capabilities:
- "task-planning"
- "tool-use"
deploymentEnv: "production"
owner: "acme-ai"
version: "1.2.0"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_NOT_FOUND"
message: "Agent not found"
410 Gone:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "AGENT_DECOMMISSIONED"
message: "Agent has been decommissioned — DID Document is no longer active"
```
---
### GET /agents/:id/did/resolve
DID resolution endpoint: resolves any `did:web` DID and returns the DID resolution result in W3C DID Resolution format. This enables external systems to use AgentIdP as a resolver for agent DIDs. Authentication required (`agents:read` scope).
```yaml
GET /agents/{agentId}/did/resolve
Authorization: Bearer <token with agents:read scope>
Path Parameters:
agentId:
type: string
Responses:
200 OK:
Content-Type: application/ld+json;profile="https://w3id.org/did-resolution"
schema:
type: object
required: [didDocument, didDocumentMetadata, didResolutionMetadata]
properties:
didDocument:
type: object
description: The resolved DID Document
didDocumentMetadata:
type: object
properties:
created:
type: string
format: date-time
updated:
type: string
format: date-time
deactivated:
type: boolean
didResolutionMetadata:
type: object
properties:
contentType:
type: string
example: "application/did+ld+json"
retrieved:
type: string
format: date-time
example:
didDocument:
"@context": ["https://www.w3.org/ns/did/v1"]
id: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
didDocumentMetadata:
created: "2026-03-29T12:00:00Z"
updated: "2026-03-29T12:00:00Z"
deactivated: false
didResolutionMetadata:
contentType: "application/did+ld+json"
retrieved: "2026-03-29T14:00:00Z"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /agents/:id/did/card
AGNTCY-format agent card derived from DID Document. Returns a JSON object representing the agent's identity and capabilities in the AGNTCY agent card format. No authentication required.
```yaml
GET /agents/{agentId}/did/card
No authentication required
Responses:
200 OK:
Content-Type: application/json
schema:
type: object
description: AGNTCY-format agent card
properties:
did:
type: string
name:
type: string
agentType:
type: string
capabilities:
type: array
items:
type: string
owner:
type: string
version:
type: string
deploymentEnv:
type: string
identityProvider:
type: string
description: DID of the issuing AgentIdP instance
issuedAt:
type: string
format: date-time
example:
did: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
name: "acme-orchestrator"
agentType: "orchestrator"
capabilities: ["task-planning", "tool-use"]
owner: "acme-ai"
version: "1.2.0"
deploymentEnv: "production"
identityProvider: "did:web:idp.sentryagent.ai"
issuedAt: "2026-03-29T12:00:00Z"
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Database Schema Changes
### New Table: agent_did_keys
Stores the public/private key pair used to sign each agent's DID Document. The private key is stored in Vault; only the public key JWK is stored in PostgreSQL.
```sql
CREATE TABLE agent_did_keys (
key_id VARCHAR(40) PRIMARY KEY,
agent_id VARCHAR(40) NOT NULL UNIQUE REFERENCES agents(agent_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
public_key_jwk JSONB NOT NULL,
vault_key_path VARCHAR(255) NOT NULL, -- Vault path where private key is stored
key_type VARCHAR(20) NOT NULL DEFAULT 'EC',
curve VARCHAR(10) NOT NULL DEFAULT 'P-256',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
rotated_at TIMESTAMPTZ,
CONSTRAINT agent_did_keys_key_type_check CHECK (key_type IN ('EC', 'RSA'))
);
CREATE INDEX idx_agent_did_keys_agent_id ON agent_did_keys(agent_id);
CREATE INDEX idx_agent_did_keys_org_id ON agent_did_keys(organization_id);
```
### New Column: agents.did
```sql
ALTER TABLE agents
ADD COLUMN did VARCHAR(255),
ADD COLUMN did_created_at TIMESTAMPTZ;
-- Populated automatically on agent creation
-- Example value: "did:web:idp.sentryagent.ai:agents:agt_01HXK7Z9P3FKWABCDEF67890"
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `DID_WEB_DOMAIN` | Domain name for `did:web` construction | Required — derived from `HOST` if not set |
| `DID_KEY_TYPE` | Cryptographic key type for DID keys | `EC` |
| `DID_KEY_CURVE` | Elliptic curve for EC keys | `P-256` |
| `DID_DOCUMENT_CACHE_TTL_SECONDS` | How long to cache DID Documents in Redis | `300` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `did-resolver` | `^4.1.0` | W3C DID resolution interface |
| `web-did-resolver` | `^2.0.27` | DID:WEB method resolver |
---
## Security Considerations
- DID Documents are public endpoints — no authentication, no rate-limit-sensitive data exposed
- Private keys for DID signing are stored in Vault; never written to PostgreSQL
- DID Document cache in Redis has a TTL — stale documents are evicted automatically
- Decommissioned agents return HTTP 410 Gone with `deactivated: true` in DID Document metadata
- DID rotation: when a credential is rotated, the DID Document key can optionally be rotated; the old key is retained in history
- `GET /agents/:id/did/card` exposes only data already present in the agent registration — no new sensitive fields
---
## Acceptance Criteria
- [ ] Every new agent registration automatically generates a `did:web` DID and key pair
- [ ] Root DID Document at `/.well-known/did.json` is W3C DID Core 1.0 compliant (validated by `did-resolver`)
- [ ] Per-agent DID Document returns correct `did:web` identifier and public key JWK
- [ ] DID resolution endpoint returns W3C DID Resolution format
- [ ] Decommissioned agent DID Document returns 410 Gone with `deactivated: true`
- [ ] Agent card at `/agents/:id/did/card` matches AGNTCY agent card format
- [ ] Private keys never appear in any API response or log
- [ ] TypeScript strict, zero `any`, >80% test coverage on DIDService

View File

@@ -0,0 +1,476 @@
# Webhooks and Event Streaming — Specification
**Workstream**: 5 of 6
**Phase**: 3 — Enterprise
**Author**: Virtual Architect
**Date**: 2026-03-29
---
## Overview
Real-time event notifications for agent lifecycle events via HTTP webhooks. Operators create webhook subscriptions specifying a target URL, the events they want to receive, and a secret for HMAC-SHA256 signature verification. Delivery is asynchronous via a Redis-backed `bull` queue with exponential backoff retry (max 10 attempts). All deliveries are logged for observability.
Supported events: `agent.created`, `agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`, `credential.generated`, `credential.rotated`, `credential.revoked`, `token.issued`, `token.revoked`.
An optional Kafka/NATS adapter enables high-throughput event streaming alongside webhook delivery.
---
## API Endpoints
### POST /webhooks
Create a new webhook subscription. Requires `agents:write` scope.
```yaml
POST /webhooks
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
required: [url, events, secret]
properties:
url:
type: string
format: uri
description: HTTPS endpoint to deliver events to
example: "https://app.example.com/hooks/agentidp"
events:
type: array
items:
type: string
enum:
- agent.created
- agent.updated
- agent.suspended
- agent.reactivated
- agent.decommissioned
- credential.generated
- credential.rotated
- credential.revoked
- token.issued
- token.revoked
- "*"
minItems: 1
description: List of event types to subscribe to. Use ["*"] to subscribe to all events.
example: ["agent.created", "credential.rotated"]
secret:
type: string
minLength: 16
description: Secret used to compute HMAC-SHA256 signature. Store securely — it is returned only once.
example: "whsec_super_secret_value_here"
description:
type: string
maxLength: 255
description: Optional human-readable description for this subscription
active:
type: boolean
default: true
Responses:
201 Created:
schema:
$ref: '#/components/schemas/WebhookSubscription'
example:
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
organizationId: "org_01HXK7Z9P3FKWABCDEF12345"
url: "https://app.example.com/hooks/agentidp"
events: ["agent.created", "credential.rotated"]
description: "Production event sink"
active: true
createdAt: "2026-03-29T12:00:00Z"
updatedAt: "2026-03-29T12:00:00Z"
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
examples:
invalid_url:
code: "VALIDATION_ERROR"
message: "url must be a valid HTTPS URI"
invalid_event:
code: "VALIDATION_ERROR"
message: "Unknown event type: agent.unknown"
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks
List webhook subscriptions for the caller's organization. Requires `agents:read` scope.
```yaml
GET /webhooks
Authorization: Bearer <token with agents:read scope>
Query Parameters:
active:
type: boolean
description: Filter by active/inactive subscriptions
page:
type: integer
default: 1
limit:
type: integer
default: 20
maximum: 100
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookSubscription'
total:
type: integer
page:
type: integer
limit:
type: integer
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id
Get a single webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:read scope>
Path Parameters:
subscriptionId:
type: string
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
code: "WEBHOOK_NOT_FOUND"
message: "Webhook subscription not found"
```
---
### PATCH /webhooks/:id
Update a webhook subscription (e.g., pause/resume, change events). Requires `agents:write` scope.
```yaml
PATCH /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Content-Type: application/json
Request Body:
schema:
type: object
properties:
url:
type: string
format: uri
events:
type: array
items:
type: string
description:
type: string
maxLength: 255
active:
type: boolean
Responses:
200 OK:
schema:
$ref: '#/components/schemas/WebhookSubscription'
400 Bad Request:
schema:
$ref: '#/components/schemas/ErrorResponse'
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### DELETE /webhooks/:id
Delete a webhook subscription. Requires `agents:write` scope.
```yaml
DELETE /webhooks/{subscriptionId}
Authorization: Bearer <token with agents:write scope>
Responses:
204 No Content: {}
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
403 Forbidden:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
### GET /webhooks/:id/deliveries
List delivery attempts for a specific webhook subscription. Requires `agents:read` scope.
```yaml
GET /webhooks/{subscriptionId}/deliveries
Authorization: Bearer <token with agents:read scope>
Query Parameters:
status:
type: string
enum: [pending, success, failed, dead_letter]
eventType:
type: string
description: Filter by event type
fromDate:
type: string
format: date-time
toDate:
type: string
format: date-time
page:
type: integer
default: 1
limit:
type: integer
default: 50
maximum: 200
Responses:
200 OK:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/WebhookDelivery'
total:
type: integer
page:
type: integer
limit:
type: integer
example:
data:
- deliveryId: "del_01HXK7Z9P3FKWABCDEF77777"
subscriptionId: "wh_01HXK7Z9P3FKWABCDEF55555"
eventType: "agent.created"
eventId: "evt_01HXK7Z9P3FKWABCDEF99999"
status: "success"
httpStatusCode: 200
attemptCount: 1
nextRetryAt: null
deliveredAt: "2026-03-29T12:00:05Z"
createdAt: "2026-03-29T12:00:00Z"
total: 1
page: 1
limit: 50
401 Unauthorized:
schema:
$ref: '#/components/schemas/ErrorResponse'
404 Not Found:
schema:
$ref: '#/components/schemas/ErrorResponse'
```
---
## Webhook Payload Format
Every webhook delivery uses this envelope format:
```json
{
"id": "evt_01HXK7Z9P3FKWABCDEF99999",
"type": "agent.created",
"organizationId": "org_01HXK7Z9P3FKWABCDEF12345",
"timestamp": "2026-03-29T12:00:00Z",
"data": {
"agentId": "agt_01HXK7Z9P3FKWABCDEF67890",
"agentType": "orchestrator",
"status": "active",
"owner": "acme-ai",
"version": "1.0.0",
"deploymentEnv": "production"
}
}
```
### HMAC-SHA256 Signature
Every delivery includes the following HTTP headers:
```
X-AgentIdP-Event: agent.created
X-AgentIdP-Delivery-Id: del_01HXK7Z9P3FKWABCDEF77777
X-AgentIdP-Timestamp: 1743249600
X-AgentIdP-Signature-256: sha256=<HMAC-SHA256 of timestamp.payload using subscription secret>
```
Signature computation:
```
signed_content = timestamp + "." + JSON.stringify(payload)
signature = HMAC-SHA256(secret, signed_content)
header_value = "sha256=" + hex(signature)
```
---
## Database Schema Changes
### New Table: webhook_subscriptions
```sql
CREATE TABLE webhook_subscriptions (
subscription_id VARCHAR(40) PRIMARY KEY,
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
url VARCHAR(2048) NOT NULL,
events JSONB NOT NULL DEFAULT '[]',
secret_hash VARCHAR(255) NOT NULL, -- bcrypt hash of secret; plain text stored in Vault
vault_secret_path VARCHAR(255) NOT NULL,
description VARCHAR(255),
active BOOLEAN NOT NULL DEFAULT TRUE,
failure_count INTEGER NOT NULL DEFAULT 0,
last_delivery_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_webhook_subs_org_id ON webhook_subscriptions(organization_id);
CREATE INDEX idx_webhook_subs_active ON webhook_subscriptions(active) WHERE active = TRUE;
```
### New Table: webhook_deliveries
```sql
CREATE TABLE webhook_deliveries (
delivery_id VARCHAR(40) PRIMARY KEY,
subscription_id VARCHAR(40) NOT NULL REFERENCES webhook_subscriptions(subscription_id),
organization_id VARCHAR(40) NOT NULL REFERENCES organizations(organization_id),
event_id VARCHAR(40) NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
http_status_code SMALLINT,
response_body TEXT,
attempt_count SMALLINT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT webhook_deliveries_status_check CHECK (status IN ('pending', 'success', 'failed', 'dead_letter'))
);
CREATE INDEX idx_webhook_deliveries_sub_id ON webhook_deliveries(subscription_id);
CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status);
CREATE INDEX idx_webhook_deliveries_org_id ON webhook_deliveries(organization_id);
CREATE INDEX idx_webhook_deliveries_created ON webhook_deliveries(created_at);
```
---
## Retry Schedule
```
Attempt 1: immediate
Attempt 2: 1 minute after failure
Attempt 3: 5 minutes after failure
Attempt 4: 15 minutes after failure
Attempt 5: 1 hour after failure
Attempt 6: 4 hours after failure
Attempt 7: 12 hours after failure
Attempt 8: 24 hours after failure
Attempt 9: 48 hours after failure
Attempt 10: 72 hours after failure
After attempt 10: status = dead_letter; operator alerted via Prometheus metric
```
---
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `WEBHOOKS_ENABLED` | Enable webhook functionality | `true` |
| `WEBHOOK_DELIVERY_TIMEOUT_MS` | HTTP delivery request timeout | `10000` |
| `WEBHOOK_MAX_RETRIES` | Maximum delivery attempts before dead-letter | `10` |
| `WEBHOOK_WORKER_CONCURRENCY` | Number of concurrent delivery workers | `5` |
| `KAFKA_BROKERS` | Comma-separated Kafka broker list (optional; activates Kafka adapter) | `""` |
| `KAFKA_TOPIC_PREFIX` | Prefix for Kafka topic names | `agentidp` |
| `NATS_URL` | NATS server URL (optional; activates NATS adapter) | `""` |
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| `bull` | `^4.16.3` | Redis-backed async job queue for webhook delivery |
| `kafkajs` | `^2.2.4` | Kafka producer adapter (optional) |
---
## Security Considerations
- Webhook secrets are stored in Vault; only a bcrypt hash is in PostgreSQL for in-memory comparison
- All deliveries must be to HTTPS endpoints — HTTP endpoints are rejected at subscription creation
- Private/internal IP ranges (RFC 1918, loopback) are blocked at delivery time to prevent SSRF
- HMAC signature allows the receiving server to verify the delivery is authentic
- Replay attacks are mitigated by including a timestamp in the signed content; receivers should reject deliveries with timestamps older than 5 minutes
- Dead-letter events generate a Prometheus metric `agentidp_webhook_dead_letters_total` for alerting
---
## Acceptance Criteria
- [ ] `POST /webhooks` creates a subscription; secret stored in Vault, not returned after creation
- [ ] Webhook delivery occurs within 30 seconds of event generation for healthy subscribers
- [ ] Delivery includes correct `X-AgentIdP-Signature-256` header verifiable with provided secret
- [ ] Failed delivery is retried per schedule; status updates in `webhook_deliveries` table
- [ ] After max retries, status is `dead_letter` and metric is incremented
- [ ] Delivery to HTTP (non-HTTPS) URL is rejected at subscription creation
- [ ] Delivery to private IP range is rejected (SSRF protection)
- [ ] `GET /webhooks/:id/deliveries` returns accurate delivery history
- [ ] TypeScript strict, zero `any`, >80% test coverage on WebhookService

802
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -12,34 +12,47 @@
"test:integration": "jest tests/integration", "test:integration": "jest tests/integration",
"db:migrate": "ts-node scripts/migrate.ts", "db:migrate": "ts-node scripts/migrate.ts",
"lint": "eslint src --ext .ts", "lint": "eslint src --ext .ts",
"format": "prettier --write src/**/*.ts" "format": "prettier --write src/**/*.ts",
"load-test": "k6 run tests/load/agent-registration.js && k6 run tests/load/token-issuance.js && k6 run tests/load/credential-rotation.js"
}, },
"dependencies": { "dependencies": {
"@open-policy-agent/opa-wasm": "^1.10.0", "@open-policy-agent/opa-wasm": "^1.10.0",
"bcryptjs": "^2.4.3", "bcryptjs": "^2.4.3",
"bull": "^4.16.5",
"cors": "^2.8.5", "cors": "^2.8.5",
"did-resolver": "^4.1.0",
"dotenv": "^16.4.5", "dotenv": "^16.4.5",
"express": "^4.18.3", "express": "^4.18.3",
"helmet": "^7.1.0", "helmet": "^7.1.0",
"ioredis": "^5.10.1",
"joi": "^17.12.3", "joi": "^17.12.3",
"jsonwebtoken": "^9.0.2", "jsonwebtoken": "^9.0.2",
"kafkajs": "^2.2.4",
"morgan": "^1.10.0", "morgan": "^1.10.0",
"node-forge": "^1.4.0",
"node-vault": "^0.12.0", "node-vault": "^0.12.0",
"oidc-provider": "^9.7.1",
"pg": "^8.11.3", "pg": "^8.11.3",
"pino": "^8.19.0", "pino": "^8.19.0",
"pino-http": "^9.0.0", "pino-http": "^9.0.0",
"prom-client": "^15.1.3", "prom-client": "^15.1.3",
"rate-limiter-flexible": "^5.0.5",
"redis": "^4.6.13", "redis": "^4.6.13",
"uuid": "^9.0.1" "stripe": "^21.0.1",
"ulid": "^3.0.2",
"uuid": "^9.0.1",
"web-did-resolver": "^2.0.32"
}, },
"devDependencies": { "devDependencies": {
"@types/bcryptjs": "^2.4.6", "@types/bcryptjs": "^2.4.6",
"@types/bull": "^3.15.9",
"@types/cors": "^2.8.17", "@types/cors": "^2.8.17",
"@types/express": "^4.17.21", "@types/express": "^4.17.21",
"@types/jest": "^29.5.12", "@types/jest": "^29.5.12",
"@types/jsonwebtoken": "^9.0.6", "@types/jsonwebtoken": "^9.0.6",
"@types/morgan": "^1.9.9", "@types/morgan": "^1.9.9",
"@types/node": "^20.12.7", "@types/node": "^20.12.7",
"@types/node-forge": "^1.3.14",
"@types/node-vault": "^0.9.1", "@types/node-vault": "^0.9.1",
"@types/pg": "^8.11.5", "@types/pg": "^8.11.5",
"@types/supertest": "^6.0.2", "@types/supertest": "^6.0.2",

View File

@@ -69,6 +69,30 @@ normalise_path(path) := "/api/v1/audit" if {
path == "/api/v1/audit" path == "/api/v1/audit"
} }
normalise_path(path) := "/api/v1/organizations/:id/members" if {
regex.match(`^/api/v1/organizations/[^/]+/members$`, path)
}
normalise_path(path) := "/api/v1/organizations/:id" if {
regex.match(`^/api/v1/organizations/[^/]+$`, path)
}
normalise_path(path) := "/api/v1/organizations" if {
path == "/api/v1/organizations"
}
normalise_path(path) := "/api/v1/federation/partners/:id" if {
regex.match(`^/api/v1/federation/partners/[^/]+$`, path)
}
normalise_path(path) := "/api/v1/federation/partners" if {
path == "/api/v1/federation/partners"
}
normalise_path(path) := "/api/v1/federation/trust" if {
path == "/api/v1/federation/trust"
}
# ─── Core allow rule ────────────────────────────────────────────────────────── # ─── Core allow rule ──────────────────────────────────────────────────────────
# allow = true if every required scope for the endpoint is present in input.scopes. # allow = true if every required scope for the endpoint is present in input.scopes.

Some files were not shown because too many files have changed in this diff Show More