Files

SentryAgent.ai Developer 8cabc0191c docs: commit all Phase 6 documentation updates and OpenSpec archives

- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook)
- developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added
- engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview)
- OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update
- VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added
- .gitignore: exclude session artifacts, build artifacts, and agent workspaces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-07 02:24:24 +00:00

44 KiB

Raw Blame History

Service Deep Dives

AgentService

Purpose: Manages the full lifecycle of AI agent identities — registration, retrieval, updates, and decommissioning.

Responsibility boundary: AgentService does not handle HTTP, credential secrets, token issuance, or audit log queries. It delegates all data access to AgentRepository and CredentialRepository, and all audit logging to AuditService. It enforces free-tier limits and domain rules before any data is written.

Public interface (key methods):

Method	Parameters	Returns	Description
`registerAgent`	`data: ICreateAgentRequest, ipAddress: string, userAgent: string`	`Promise<IAgent>`	Checks the free-tier 100-agent limit, enforces email uniqueness, creates the agent record, writes an `agent.created` audit event, increments `agentidp_agents_registered_total` Prometheus counter
`getAgentById`	`agentId: string`	`Promise<IAgent>`	Retrieves a single agent by UUID; throws `AgentNotFoundError` if not found
`listAgents`	`filters: IAgentListFilters`	`Promise<IPaginatedAgentsResponse>`	Returns a paginated, optionally filtered list; filters include `owner`, `agentType`, `status`, `page`, `limit`
`updateAgent`	`agentId: string, data: IUpdateAgentRequest, ipAddress: string, userAgent: string`	`Promise<IAgent>`	Partially updates agent metadata; rejects updates to decommissioned agents; determines the correct audit action (`agent.updated`, `agent.suspended`, `agent.reactivated`, `agent.decommissioned`) based on status transition
`decommissionAgent`	`agentId: string, ipAddress: string, userAgent: string`	`Promise<void>`	Soft-deletes the agent (sets `status = 'decommissioned'`); revokes all active credentials by calling `credentialRepository.revokeAllForAgent(agentId)` before decommissioning

Database / storage schema:

Table agents: agent_id (UUID PK), email (UNIQUE), agent_type, version, capabilities (text array), owner, deployment_env, status, created_at, updated_at.
No Redis usage — AgentService is PostgreSQL-only.

Error types:

FreeTierLimitError (403) — 100-agent limit reached
AgentAlreadyExistsError (409) — email already registered
AgentNotFoundError (404) — agent UUID not found
AgentAlreadyDecommissionedError (409) — agent is already decommissioned

Configuration: None — AgentService reads no environment variables. The free-tier limit (FREE_TIER_MAX_AGENTS = 100) is a module-level constant.

OAuth2Service

Purpose: Issues, introspects, and revokes RS256 JWT access tokens via the OAuth 2.0 Client Credentials grant.

Responsibility boundary: OAuth2Service does not know about HTTP or routing. It receives already-extracted values (clientId, clientSecret, scope) from the controller, resolves credential verification (Vault or bcrypt), enforces the 10,000 tokens/month free-tier limit, and returns a typed ITokenResponse. All audit writes on high-throughput paths (issue, introspect, revoke) are fire-and-forget (void) to keep token endpoint latency low.

Public interface (key methods):

Method	Parameters	Returns	Description
`issueToken`	`clientId: string, clientSecret: string, scope: string, ipAddress: string, userAgent: string`	`Promise<ITokenResponse>`	Verifies credentials (Vault or bcrypt), checks agent status, enforces 10k/month limit, signs RS256 JWT, increments monthly counter and audit event as fire-and-forget
`introspectToken`	`token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string`	`Promise<IIntrospectResponse>`	Verifies JWT signature and checks Redis revocation list; always returns 200 with `active: true/false` per RFC 7662
`revokeToken`	`token: string, callerPayload: ITokenPayload, ipAddress: string, userAgent: string`	`Promise<void>`	Decodes token without verification; enforces that caller can only revoke their own tokens (`decoded.sub === callerPayload.sub`); adds JTI to Redis revocation list with TTL matching token expiry

Database / storage schema:

Redis key revoked:{jti} — value 1, TTL = seconds until token expiry. Written on revocation; read on every authenticated request via authMiddleware.
Redis key monthly:tokens:{agentId}:{yyyy-mm} — integer counter, incremented on every successful token issuance. Read to enforce the 10k/month free-tier limit.

Error types:

AuthenticationError (401) — agent not found, or no active credential matches the provided secret
AuthorizationError (403) — agent is suspended or decommissioned; or caller attempts to revoke another agent's token
FreeTierLimitError (403) — 10,000 tokens/month limit reached

Configuration:

JWT_PRIVATE_KEY — PEM-encoded RSA private key, required, read at app startup in src/app.ts
JWT_PUBLIC_KEY — PEM-encoded RSA public key, required, read at app startup and in authMiddleware
VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT — optional; when set, Vault is used for credential verification instead of bcrypt

CredentialService

Purpose: Manages the full lifecycle of agent credentials — generation, listing, rotation, and revocation.

Responsibility boundary: CredentialService does not know about HTTP or token issuance. It enforces that credentials can only be generated for active agents. It delegates secret storage to either VaultClient (Phase 2) or bcrypt (Phase 1 fallback). The plain-text clientSecret is generated here, returned once in the response, and never stored or logged — only the bcrypt hash or Vault path is persisted.

Public interface (key methods):

Method	Parameters	Returns	Description
`generateCredential`	`agentId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string`	`Promise<ICredentialWithSecret>`	Verifies agent exists and is `active`; generates a cryptographically random secret via `generateClientSecret()`; writes to Vault (when configured) or hashes with bcrypt; returns `ICredentialWithSecret` — the only time the plain-text secret is returned
`listCredentials`	`agentId: string, filters: ICredentialListFilters`	`Promise<IPaginatedCredentialsResponse>`	Returns paginated credentials for an agent; `clientSecret` is never included in list responses
`rotateCredential`	`agentId: string, credentialId: string, data: IGenerateCredentialRequest, ipAddress: string, userAgent: string`	`Promise<ICredentialWithSecret>`	Generates a new secret for the same `credentialId`; overwrites Vault entry (new KV v2 version) or updates bcrypt hash; old secret is immediately invalidated; returns new `ICredentialWithSecret` once
`revokeCredential`	`agentId: string, credentialId: string, ipAddress: string, userAgent: string`	`Promise<void>`	Sets credential `status = 'revoked'`; permanently deletes the Vault secret via `vaultClient.deleteSecret()` when Vault is configured; idempotent rejection of already-revoked credentials with `CredentialAlreadyRevokedError`

Database / storage schema:

Table credentials: credential_id (UUID PK), client_id (= agentId, FK to agents), secret_hash (bcrypt hash; empty string when Vault path is set), vault_path (nullable — KV v2 data path), status, created_at, expires_at (nullable), revoked_at (nullable).

Error types:

AgentNotFoundError (404) — agent UUID not found
CredentialError (400) — agent is not in active status (code: AGENT_NOT_ACTIVE)
CredentialNotFoundError (404) — credential not found or belongs to a different agent
CredentialAlreadyRevokedError (409) — credential is already revoked

Configuration:

VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT — optional; when set, new credentials are stored in Vault KV v2 instead of bcrypt. Existing bcrypt-based credentials continue to work unchanged.

AuditService

Purpose: Creates and queries immutable audit events for compliance and observability.

Responsibility boundary: AuditService does not know about HTTP, tokens, or agents. It receives already-assembled event data from other services and delegates all persistence to AuditRepository. It enforces the 90-day free-tier retention window on all query and retrieval operations — events older than 90 days are treated as non-existent.

Public interface (key methods):

Method	Parameters	Returns	Description
`logEvent`	`agentId: string, action: AuditAction, outcome: AuditOutcome, ipAddress: string, userAgent: string, metadata: Record<string, unknown>`	`Promise<IAuditEvent>`	Writes an immutable audit row to PostgreSQL. For token endpoints, callers use `void` (fire-and-forget). For CRUD operations, callers `await` this method.
`queryEvents`	`filters: IAuditListFilters`	`Promise<IPaginatedAuditEventsResponse>`	Returns paginated, filtered audit events; enforces the 90-day retention window by computing the cutoff date and rejecting queries with `fromDate` before the cutoff; validates that `fromDate <= toDate`
`getEventById`	`eventId: string`	`Promise<IAuditEvent>`	Retrieves a single event by UUID; returns `AuditEventNotFoundError` for both genuinely missing events and events outside the 90-day retention window (indistinguishable by design)

Database / storage schema:

Table audit_events: event_id (UUID PK), agent_id (text FK to agents), action (text — one of the AuditAction union type values), outcome (success or failure), ip_address (text), user_agent (text), metadata (JSONB), timestamp (timestamptz, NOT NULL, indexed).
No Redis usage — AuditService is PostgreSQL-only.

Error types:

AuditEventNotFoundError (404) — event not found or outside retention window
RetentionWindowError (400) — query fromDate is before the 90-day retention cutoff
ValidationError (400) — fromDate is after toDate

Configuration: None — the retention window (FREE_TIER_RETENTION_DAYS = 90) is a module-level constant.

VaultClient

Purpose: Wraps HashiCorp Vault KV v2 operations for credential secret storage and verification.

Responsibility boundary: VaultClient is a client adapter — it knows only about Vault API calls. It has no knowledge of business rules, HTTP, or PostgreSQL. It is injected into CredentialService and OAuth2Service via constructor injection. When VAULT_ADDR is not set, createVaultClientFromEnv() returns null and the bcrypt code path is used unchanged.

Public methods:

Method	Parameters	Returns	Description
`writeSecret`	`agentId: string, credentialId: string, plainSecret: string`	`Promise<string>`	Writes the plain-text secret to the KV v2 data path; returns the path; creates a new KV v2 version on subsequent calls (used for rotation)
`readSecret`	`agentId: string, credentialId: string`	`Promise<string>`	Reads and returns the plain-text secret from Vault; throws `CredentialError` if the path is not found or the read fails
`verifySecret`	`agentId: string, credentialId: string, candidateSecret: string`	`Promise<boolean>`	Reads the stored secret via `readSecret`, then compares using `crypto.timingSafeEqual` to prevent timing-based side-channel attacks; returns `false` on any Vault error rather than throwing
`deleteSecret`	`agentId: string, credentialId: string`	`Promise<void>`	Permanently deletes all versions of a credential secret by calling the KV v2 metadata path (`DELETE {mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}`)

KV v2 path structure:

Data path: {mount}/data/agentidp/agents/{agentId}/credentials/{credentialId}
Metadata path (for permanent deletion): {mount}/metadata/agentidp/agents/{agentId}/credentials/{credentialId}
Default mount: secret (overridable via VAULT_MOUNT)

Opt-in configuration:

VAULT_ADDR — Vault server address (e.g. http://127.0.0.1:8200) — required to enable Vault mode
VAULT_TOKEN — Vault authentication token — required to enable Vault mode
VAULT_MOUNT — KV v2 mount path — optional, defaults to secret

Constant-time comparison rationale: The verifySecret method uses Node.js crypto.timingSafeEqual instead of === to prevent attackers from inferring the length or content of stored secrets by measuring how long the comparison takes. When the stored and candidate secrets differ in length, a dummy timingSafeEqual call is still performed to eliminate the timing signal from the early-exit path.

OPA Policy Engine

Purpose: Enforces scope-based authorisation on every protected HTTP request without requiring a code deployment to change access rules.

Responsibility boundary: The OPA policy engine (src/middleware/opa.ts) is a middleware layer — it does not know about business rules, credentials, or audit events. It receives the HTTP method, full request path, and caller scopes from req.user, and returns allow or deny. All policy logic lives in policies/authz.rego and policies/data/scopes.json.

Policy file locations:

policies/authz.rego — Rego policy defining normalise_path, lookup_key, and the allow rule. Evaluated by the Wasm bundle when compiled; replicated in TypeScript for the fallback path.
policies/data/scopes.json — JSON map of "METHOD:/path/pattern" → [required_scopes]. Loaded as data into the Wasm policy and used directly by the TypeScript fallback.
policies/authz.wasm — compiled Wasm bundle (not committed to source control; built from authz.rego using the OPA CLI). When present, the Wasm path is used; when absent, the TypeScript fallback reads scopes.json.

How opaMiddleware evaluates input:

createOpaMiddleware() is called once at app startup in src/app.ts.
It attempts to load policies/authz.wasm. If found, loadPolicy(wasmBuffer) is called and scopes.json data is injected via loaded.setData(parsed).
If no Wasm bundle is found, scopes.json is loaded into scopesMap as the TypeScript fallback.
On every request, the middleware builds an OpaInput object: { method: req.method, path: req.baseUrl + req.path, scopes: req.user.scope.split(' ') }.
evaluate(input) checks the Wasm policy (if loaded) or applies normalisePath + scope-intersection logic against scopesMap. Returns false if neither is loaded (fail-closed).
If evaluate returns false, the middleware calls next(new AuthorizationError()).

How to write a new policy rule:

Add the new endpoint's scope requirement to policies/data/scopes.json:
```
"GET:/api/v1/reports": ["reports:read"]
```
Add "reports:read" to the OAuthScope union type in src/types/index.ts.
If Wasm mode is in use, recompile authz.rego to authz.wasm using the OPA CLI: opa build policies/authz.rego -d policies/data/ -o policies/authz.wasm.
Send SIGHUP to the running process to hot-reload: kill -HUP <pid>.

How to test a policy rule:

# Using the OPA CLI directly
opa eval --data policies/data/scopes.json \
         --input '{"method":"GET","path":"/api/v1/agents","scopes":["agents:read"]}' \
         --bundle policies/ \
         'data.authz.allow'

Expected output: true. Replace method/path/scopes to test deny cases.

Hot-reload via SIGHUP: When SIGHUP is received by the Node.js process, server.ts calls reloadOpaPolicy(). This re-executes the same startup loading logic: tries to load the Wasm bundle, falls back to scopes.json. The in-memory wasmPolicy and scopesMap module-level variables are replaced atomically. No requests are dropped.

Web Dashboard

Purpose: Provides a browser-based UI for human operators to manage agents, credentials, and audit logs without writing API calls directly.

Responsibility boundary: The dashboard is a pure client-side React SPA. It has no server-side logic. It calls the AgentIdP REST API using the @sentryagent/idp-sdk TokenManager for authentication and a typed ApiClient from dashboard/src/lib/client.ts for all API calls. It never stores the access_token in localStorage — only client_id, client_secret, and baseUrl are stored in sessionStorage (cleared on tab close).

React component structure:

dashboard/src/
├── main.tsx               # React root — mounts App into #root, wraps with BrowserRouter
├── App.tsx                # Route definitions — AuthProvider, RequireAuth, AppShell
├── lib/
│   ├── auth.tsx           # AuthContext, AuthProvider, useAuth hook, sessionStorage helpers
│   └── client.ts          # Typed ApiClient class — wraps fetch with TokenManager token injection
├── components/
│   ├── RequireAuth.tsx    # Route guard — redirects to /dashboard/login if not authenticated
│   └── layout/AppShell.tsx # Persistent sidebar navigation + Outlet for page content
└── pages/
    ├── Login.tsx          # Login form — calls auth.login(), redirects to /dashboard/agents
    ├── Agents.tsx         # Paginated agents list with status filter and search
    ├── AgentDetail.tsx    # Single agent view — status, metadata, update, decommission actions
    ├── Credentials.tsx    # Credential list for an agent — generate, rotate, revoke actions
    ├── AuditLog.tsx       # Paginated audit log with date range and action filters
    └── Health.tsx         # /health endpoint response — PostgreSQL and Redis status display

Authentication flow with sessionStorage:

On Login.tsx form submit, auth.login(creds) is called.
validateCredentials(creds) creates a TokenManager and calls getToken() — if this succeeds, the credentials are valid.
saveCredentials(creds) stores { clientId, clientSecret, baseUrl } in sessionStorage under key agentidp_credentials.
On every subsequent API call, getClient() in lib/client.ts reads credentials from sessionStorage, creates a TokenManager, and injects the current access_token into the Authorization: Bearer header. The TokenManager handles automatic token refresh when the token is expired.
auth.logout() calls clearCredentials() (removes the sessionStorage key) and navigates to /dashboard/login.

Main views and their API calls:

Agents — GET /api/v1/agents?page=N&limit=20 — paginated list with status filter
AgentDetail — GET /api/v1/agents/:id, PATCH /api/v1/agents/:id, DELETE /api/v1/agents/:id
Credentials — GET /api/v1/agents/:id/credentials, POST /api/v1/agents/:id/credentials, POST /api/v1/agents/:id/credentials/:credId/rotate, DELETE /api/v1/agents/:id/credentials/:credId
AuditLog — GET /api/v1/audit?page=N&limit=20&fromDate=...&toDate=...
Health — GET /health

Local development:

cd dashboard
npm install
npm run dev    # Vite dev server with HMR — dashboard available at http://localhost:5173/dashboard

The Vite dev server proxies /api/ calls to the Express server at http://localhost:3000. The Express server must be running separately for API calls to work.

Prometheus/Grafana Monitoring

Purpose: Provides operational visibility into AgentIdP's HTTP traffic, token issuance rates, agent registration rates, database latency, and Redis command latency.

Responsibility boundary: The metrics middleware (src/middleware/metrics.ts) and the metrics registry (src/metrics/registry.ts) are observability concerns only — they do not affect business logic. Metrics are exposed at GET /metrics via createMetricsRouter() using metricsRegistry.metrics() from prom-client. The /metrics endpoint is unauthenticated, intended for scraping by Prometheus only and not exposed to the public internet.

Key metrics with labels:

Metric Name	Type	Labels	Description
`agentidp_http_requests_total`	Counter	`method`, `route`, `status_code`	Total HTTP requests received; route is normalised (UUIDs replaced with `:id`)
`agentidp_http_request_duration_seconds`	Histogram	`method`, `route`, `status_code`	HTTP request duration; buckets from 5ms to 2.5s
`agentidp_tokens_issued_total`	Counter	`scope`	Total OAuth 2.0 access tokens successfully issued
`agentidp_agents_registered_total`	Counter	`deployment_env`	Total AI agents successfully registered
`agentidp_db_query_duration_seconds`	Histogram	`operation`	PostgreSQL query duration; buckets from 1ms to 1s
`agentidp_redis_command_duration_seconds`	Histogram	`command`	Redis command duration; buckets from 0.5ms to 250ms

How to add a new Counter:

Open src/metrics/registry.ts.

Add a new Counter export:

export const myNewCounter = new Counter({
  name: 'agentidp_my_new_counter_total',
  help: 'Description of what this counts.',
  labelNames: ['label_one'] as const,
  registers: [metricsRegistry],
});

Import and call myNewCounter.inc({ label_one: value }) in the service or middleware where the event occurs.

How to add a new Histogram:

Open src/metrics/registry.ts.

Add a new Histogram export with appropriate buckets:

export const myDurationHistogram = new Histogram({
  name: 'agentidp_my_operation_duration_seconds',
  help: 'Duration of my operation in seconds.',
  labelNames: ['operation'] as const,
  buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
  registers: [metricsRegistry],
});

Use const end = myDurationHistogram.startTimer({ operation: 'name' }); ... end(); around the operation being measured.

Grafana access in local Docker:

Start the monitoring overlay:

docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up

Prometheus: http://localhost:9090
Grafana: http://localhost:3001 — default credentials: admin / agentidp

Grafana is pre-provisioned with a Prometheus data source pointing to http://prometheus:9090 and dashboard JSON files from monitoring/grafana/dashboards/. No manual configuration is needed after startup.

AnalyticsService

Purpose: Records daily aggregated analytics events (token issuances, agent activity) and exposes query methods for token trends, agent activity heatmaps, and per-agent usage summaries. All query methods scope results strictly to the supplied tenantId. The recordEvent method is fire-and-forget — it catches all errors internally and never propagates them to the caller, so analytics writes never block primary request paths.

Public methods:

Method	Parameters	Returns	Description
`recordEvent`	`tenantId: string, metricType: string`	`Promise<void>`	Upserts a daily counter row in `analytics_events` via `INSERT ... ON CONFLICT DO UPDATE SET count = count + 1`. Catches and swallows all errors; safe to call with `void` on hot paths.
`getTokenTrend`	`tenantId: string, days: number`	`Promise<ITokenTrendEntry[]>`	Returns daily token issuance counts for the last N days (clamped to 90). Uses `generate_series` + `LEFT JOIN` so that days with no events appear as `count: 0`. Results sorted ascending by date.
`getAgentActivity`	`tenantId: string`	`Promise<IAgentActivityEntry[]>`	Returns agent activity bucketed by day-of-week (0=Sun…6=Sat) and hour-of-day for the last 30 days. Reads only rows whose `metric_type` matches the pattern `agent:<agentId>:<metricType>`.
`getAgentUsageSummary`	`tenantId: string`	`Promise<IAgentUsageSummaryEntry[]>`	Returns per-agent token issuance totals for the current calendar month, joined with the agent name (`owner` field). Sorted descending by `token_count`. Excludes decommissioned agents.

Dependencies: PostgreSQL connection pool (Pool from pg). No Redis usage.

Configuration: None. MAX_TREND_DAYS = 90 is a module-level constant.

DB tables:

analytics_events: organization_id (UUID FK to organizations), date (DATE), metric_type (text — e.g. 'token_issued', 'agent:<agentId>:token_issued'), count (integer). Unique constraint on (organization_id, date, metric_type).
agents: read in getAgentUsageSummary to join owner and filter by organization_id.

TierService

Purpose: Single authority for all subscription tier business logic — fetches current tier and live usage, initiates Stripe Checkout sessions for upgrades, applies confirmed upgrades to the organizations table, and enforces per-tier agent count limits. Controllers and middleware delegate all tier decisions to this service; no tier logic lives elsewhere.

Public methods:

Method	Parameters	Returns	Description
`getStatus`	`orgId: string`	`Promise<ITierStatus>`	Returns current `tier`, per-tier `limits` (from `TIER_CONFIG`), live `usage` (Redis counters + DB agent count), and `resetAt` (ISO 8601 next UTC midnight). Falls back to `0` for Redis counters when Redis is unavailable.
`initiateUpgrade`	`orgId: string, targetTier: TierName`	`Promise<IUpgradeInitiation>`	Validates that `targetTier` is strictly higher rank than current tier. Creates a Stripe Checkout Session with `mode: 'subscription'`, `metadata: { orgId, targetTier }`, and the price ID from `STRIPE_PRICE_ID_<TIER>` env var. Returns `{ checkoutUrl }`.
`applyUpgrade`	`orgId: string, tier: TierName`	`Promise<void>`	Sets `organizations.tier` and `organizations.tier_updated_at = NOW()`. Called by the Stripe webhook handler after `checkout.session.completed`.
`fetchTier`	`orgId: string`	`Promise<TierName>`	Queries `organizations.tier` for the given org. Returns `'free'` as a safe default when no row is found or the stored value is not a valid `TierName`.
`enforceAgentLimit`	`orgId: string, tier: TierName`	`Promise<void>`	Counts non-decommissioned agents for the org and throws `TierLimitError` when count is at or over `TIER_CONFIG[tier].maxAgents`. No-op for Enterprise (infinite limit). Called by `AgentService` before creating a new agent.

Dependencies: PostgreSQL (Pool), Redis (RedisClientType), Stripe client (Stripe). Imports TIER_CONFIG and TIER_RANK from src/config/tiers.ts.

Configuration:

STRIPE_PRICE_ID_PRO — Stripe price ID for the Pro tier
STRIPE_PRICE_ID_ENTERPRISE — Stripe price ID for the Enterprise tier
STRIPE_PRICE_ID — Fallback Stripe price ID when tier-specific vars are not set
STRIPE_SUCCESS_URL — Redirect URL on successful checkout (default: APP_BASE_URL/dashboard?billing=success)
STRIPE_CANCEL_URL — Redirect URL when checkout is cancelled (default: APP_BASE_URL/dashboard?billing=cancel)
APP_BASE_URL — Base URL for redirect URL construction (default: http://localhost:3000)

Redis keys:

rate:tier:calls:<orgId> — integer, daily API call counter; TTL set at next UTC midnight. Read in getStatus.
rate:tier:tokens:<orgId> — integer, daily token issuance counter; same TTL. Read in getStatus.

DB tables:

organizations: organization_id (UUID PK), tier (text — 'free'|'pro'|'enterprise'), tier_updated_at (timestamptz). Read in fetchTier; written in applyUpgrade.
agents: read in enforceAgentLimit and getStatus to count non-decommissioned agents per org.

Error types:

ValidationError (400) — target tier is not higher than current tier
TierLimitError (429) — agent count limit reached for the current tier

ComplianceService

Purpose: Generates AGNTCY-standard compliance reports and exports agent cards for a tenant. Reports cover two sections: agent-identity (DID presence and credential expiry checks) and audit-trail (cryptographic hash chain verification). Reports are cached in Redis for 5 minutes to avoid repeated expensive DB queries. Agent card export returns all active agents in AGNTCY-standard JSON format.

Public methods:

Method	Parameters	Returns	Description
`generateReport`	`tenantId: string`	`Promise<IComplianceReport>`	Attempts to read `compliance:report:<tenantId>` from Redis; if found, returns it with `from_cache: true`. Otherwise builds the report by running `buildAgentIdentitySection` and `buildAuditTrailSection` in parallel, rolls up the overall status (fail > warn > pass), caches the result for 300 seconds, and returns it.
`exportAgentCards`	`tenantId: string`	`Promise<IAgentCard[]>`	Queries all non-decommissioned agents for the tenant and maps each to an AGNTCY agent card with `id` (DID or agent UUID), `name`, `capabilities`, `endpoint`, `created_at`, and `agntcy_schema_version: '1.0'`.

Dependencies: PostgreSQL (Pool), Redis (RedisClientType). Internally instantiates AuditVerificationService for hash chain verification.

Configuration: None. CACHE_TTL_SECONDS = 300 and AGNTCY_SCHEMA_VERSION = '1.0' are module-level constants.

Redis keys:

compliance:report:<tenantId> — JSON-serialised IComplianceReport, TTL 300 seconds. Written by generateReport; read on every call within the cache window.

DB tables:

agents: queried in both buildAgentIdentitySection (checks DID presence) and exportAgentCards.
credentials: queried in buildAgentIdentitySection to check active credential expiry per agent.
audit_events: read via AuditVerificationService in buildAuditTrailSection to verify hash chain integrity.

Error types: None thrown directly. Internal errors in section builders produce status: 'fail' sections rather than exceptions.

Report structure:

agent-identity section: fail when any active agent is missing a DID or has expired credentials; warn when any credential expires within 7 days; pass otherwise.
audit-trail section: fail when AuditVerificationService.verifyChain() returns verified: false; pass otherwise.

FederationService

Purpose: Manages trusted federation partners and cross-IdP JWT token verification. At partner registration, the partner's JWKS endpoint is validated and the keys are cached in Redis. At token verification, the service fetches (or reuses cached) partner JWKS, verifies the JWT signature and standard claims, enforces the partner's allowed_organizations filter, and rejects tokens from suspended or expired partners.

Public methods:

Method	Parameters	Returns	Description
`registerPartner`	`req: ICreatePartnerRequest`	`Promise<IFederationPartner>`	Validates the `jwks_uri` is reachable (5-second timeout) and returns valid JWKS. Inserts the partner row into `federation_partners`. Caches the JWKS in Redis under `federation:jwks:<issuer>`.
`listPartners`	(none)	`Promise<IFederationPartner[]>`	Updates any partners past `expires_at` to `status = 'expired'` before returning all rows ordered by `created_at DESC`.
`getPartner`	`id: string`	`Promise<IFederationPartner>`	Applies the same expiry update, then returns the partner row. Throws `FederationPartnerNotFoundError` (404) when not found.
`updatePartner`	`id: string, req: IUpdatePartnerRequest`	`Promise<IFederationPartner>`	Applies a partial update. When `jwks_uri` changes, invalidates the old issuer's JWKS cache entry (`DEL federation:jwks:<oldIssuer>`).
`deletePartner`	`id: string`	`Promise<void>`	Deletes the partner row and invalidates the JWKS cache.
`verifyFederatedToken`	`req: IFederationVerifyRequest`	`Promise<IFederationVerifyResult>`	Decodes token header/payload without verification, rejects `alg:none`, looks up partner by `iss`, checks partner status and expiry, fetches JWKS (cache-first), finds matching key by `kid`, converts JWK to PEM, verifies signature via `jsonwebtoken.verify` (RS256 or ES256), enforces `allowed_organizations` filter. Returns `{ valid, issuer, subject, organization_id, claims }`.

Dependencies: PostgreSQL (Pool), Redis (RedisClientType). Uses jsonwebtoken for JWT decoding/verification and Node.js crypto.createPublicKey for JWK-to-PEM conversion.

Configuration:

FEDERATION_JWKS_CACHE_TTL_SECONDS — TTL for cached partner JWKS in Redis (default: 3600)

Redis keys:

federation:jwks:<issuer> — JSON-serialised IJWKSKey[], TTL from FEDERATION_JWKS_CACHE_TTL_SECONDS. Written on partner registration and on cache miss during token verification; deleted when a partner is updated (JWKS URI changed) or deleted.

DB tables:

federation_partners: id (UUID PK), name (text), issuer (text — the IdP's issuer URL), jwks_uri (text), allowed_organizations (text[] — empty means all orgs allowed), status (active|suspended|expired), created_at, updated_at, expires_at (nullable timestamptz).

Error types:

FederationPartnerError (400) — JWKS endpoint unreachable or returns invalid JWKS
FederationPartnerNotFoundError (404) — partner UUID not found
FederationVerificationError (401) — token malformed, alg:none, unknown issuer, partner suspended/expired, signature invalid, org not in allow list

DIDService

Purpose: Manages W3C DID Core 1.0 document generation, EC P-256 key pair creation, and AGNTCY agent card export. Generates per-agent did:web identifiers, stores private keys in HashiCorp Vault (or records a dev:no-vault marker in dev mode), and caches DID documents in Redis. Builds both an instance-level DID document (for AgentIdP itself) and per-agent DID documents with AGNTCY extension properties.

Public methods:

Method	Parameters	Returns	Description
`generateDIDForAgent`	`agentId: string, organizationId: string`	`Promise<{ did: string; publicKeyJwk: IPublicKeyJwk }>`	Generates an EC P-256 key pair. Stores the private key PEM in Vault KV v2 at `{mount}/data/agentidp/agents/{agentId}/did-key`. Encrypts the vault path via `EncryptionService` (when configured). Inserts a row into `agent_did_keys`. Updates `agents.did` and `agents.did_created_at`. Returns the `did:web` identifier and public key JWK.
`buildInstanceDIDDocument`	(none)	`Promise<IDIDDocument>`	Builds the root instance DID document for AgentIdP (format: `did:web:{DID_WEB_DOMAIN}`). Cached in Redis under `did:doc:instance`.
`buildAgentDIDDocument`	`agentId: string`	`Promise<IAgentDIDDocumentResult>`	Builds a per-agent DID document (format: `did:web:{DID_WEB_DOMAIN}:agents:{agentId}`). Decommissioned agents get a deactivated document with an `AgentStatus: decommissioned` service entry. Cached in Redis under `did:doc:{agentId}` for active agents only. Throws `AgentNotFoundError` if the agent does not exist.
`buildResolutionResult`	`agentId: string`	`Promise<IDIDResolutionResult>`	Wraps `buildAgentDIDDocument` with W3C DID Resolution metadata (`didDocumentMetadata`, `didResolutionMetadata`).
`buildAgentCard`	`agentId: string`	`Promise<IAgentCard>`	Returns an AGNTCY-format agent card with `did`, `name` (agent email), `agentType`, `capabilities`, `owner`, `version`, `deploymentEnv`, `identityProvider`, and `issuedAt`.

Dependencies: PostgreSQL (Pool), Redis (RedisClientType), optional VaultClient, optional EncryptionService. Uses node-vault directly for DID private key storage.

Configuration:

DID_WEB_DOMAIN — required; the domain for did:web DID construction (e.g. idp.sentryagent.ai)
DID_DOCUMENT_CACHE_TTL_SECONDS — Redis cache TTL for DID documents (default: 300)
VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT — when set, private keys are stored in Vault; otherwise dev:no-vault marker is used

Redis keys:

did:doc:instance — JSON-serialised instance IDIDDocument, TTL from DID_DOCUMENT_CACHE_TTL_SECONDS
did:doc:<agentId> — JSON-serialised per-agent IDIDDocument, same TTL. Not cached for decommissioned agents.

DB tables:

agents: did (text — did:web:...), did_created_at (timestamptz). Written by generateDIDForAgent; read in all document-building methods.
agent_did_keys: key_id (UUID PK), agent_id (UUID FK), organization_id (UUID FK), public_key_jwk (JSONB), vault_key_path (text — Vault KV v2 path or dev:no-vault), key_type ('EC'), curve ('P-256'), created_at. Written by generateDIDForAgent.

Error types:

AgentNotFoundError (404) — agent UUID not found in buildAgentDIDDocument, buildResolutionResult, buildAgentCard

WebhookService

Purpose: Manages webhook subscriptions and their delivery history for a tenant organisation. HMAC signing secrets are stored in HashiCorp Vault KV v2 (when configured) or bcrypt-hashed in PostgreSQL in local mode. The raw secret is only returned once at subscription creation time. vault_secret_path is encrypted at rest via EncryptionService (AES-256-CBC) before being written to PostgreSQL (SOC 2 CC6.1 compliance).

Public methods:

Method	Parameters	Returns	Description
`createSubscription`	`orgId: string, req: ICreateWebhookRequest`	`Promise<IWebhookSubscription & { secret: string }>`	Generates a 32-byte random hex HMAC secret. Stores in Vault at `secret/data/agentidp/webhooks/{orgId}/{id}/secret` (Vault mode) or bcrypt-hashes and stores in `secret_hash` (local mode). Encrypts `vault_secret_path` via `EncryptionService`. Returns the subscription including the one-time `secret`. Validates URL must use `https://` and events array must be non-empty.
`listSubscriptions`	`orgId: string`	`Promise<IWebhookSubscription[]>`	Returns all subscriptions for the org, ordered by `created_at DESC`. No secret fields are included.
`getSubscription`	`id: string, orgId: string`	`Promise<IWebhookSubscription>`	Returns a single subscription. Verifies org ownership.
`updateSubscription`	`id: string, orgId: string, req: IUpdateWebhookRequest`	`Promise<IWebhookSubscription>`	Partially updates `name`, `url`, `events`, or `active` fields. Validates `https://` if URL is changing.
`deleteSubscription`	`id: string, orgId: string`	`Promise<void>`	Permanently deletes the subscription and all deliveries (via PostgreSQL CASCADE).
`getSubscriptionSecret`	`subscriptionId: string, orgId: string`	`Promise<string>`	Retrieves the raw HMAC secret from Vault (Vault mode only). Throws `WebhookValidationError` in local mode since the secret cannot be recovered after creation.
`listDeliveries`	`subscriptionId: string, orgId: string, limit: number, offset: number`	`Promise<IPaginatedDeliveriesResponse>`	Returns paginated delivery records for a subscription. Verifies org ownership before querying.

Dependencies: PostgreSQL (Pool), optional VaultClient, Redis (RedisClientType — reserved for future caching), optional EncryptionService.

Configuration: Inherits Vault configuration from VaultClient (VAULT_ADDR, VAULT_TOKEN, VAULT_MOUNT). EncryptionService requires ENCRYPTION_KEY env var (see EncryptionService docs).

DB tables:

webhook_subscriptions: id (UUID PK), organization_id (UUID FK), name (text), url (text — https only), events (JSONB — WebhookEventType[]), secret_hash (text — bcrypt hash in local mode, 'vault' in Vault mode), vault_secret_path (text — encrypted Vault path or 'local'), active (boolean), failure_count (integer), created_at, updated_at.
webhook_deliveries: id (UUID PK), subscription_id (UUID FK), event_type (text), payload (JSONB), status (pending|delivered|failed|dead_letter), http_status_code (integer nullable), attempt_count (integer), next_retry_at (timestamptz nullable), delivered_at (timestamptz nullable), created_at, updated_at. Cascades on subscription delete.

Error types:

WebhookNotFoundError (404) — subscription not found or belongs to another org
WebhookValidationError (400) — invalid URL scheme, empty events array, or secret not recoverable in local mode

BillingService

Purpose: Manages Stripe billing integration — creates Checkout Sessions for tenant subscriptions, processes incoming Stripe webhook events (subscription lifecycle and checkout completion), and retrieves current subscription status. When a checkout.session.completed event carries { orgId, targetTier } in its metadata, delegates to TierService.applyUpgrade to update the organisation's tier.

Public methods:

Method	Parameters	Returns	Description
`createCheckoutSession`	`tenantId: string, successUrl: string, cancelUrl: string`	`Promise<string>`	Creates a Stripe Checkout Session with `mode: 'subscription'`, `client_reference_id: tenantId`, and the price from `STRIPE_PRICE_ID`. Returns the checkout URL. Throws if Stripe does not return a URL.
`handleWebhookEvent`	`rawBody: Buffer, sig: string, webhookSecret: string`	`Promise<void>`	Verifies the Stripe webhook signature via `stripe.webhooks.constructEvent`. Handles `customer.subscription.created/updated/deleted` (upserts `tenant_subscriptions`) and `checkout.session.completed` (applies tier upgrade via `TierService` when metadata contains `orgId` and `targetTier`).
`getSubscriptionStatus`	`tenantId: string`	`Promise<ISubscriptionStatus>`	Queries `tenant_subscriptions` for the given tenant. Returns `{ tenantId, status: 'free', currentPeriodEnd: null, stripeSubscriptionId: null }` when no row exists.

Dependencies: PostgreSQL (Pool), Stripe client (Stripe), optional TierService.

Configuration:

STRIPE_PRICE_ID — Stripe price ID for subscription checkout sessions
STRIPE_WEBHOOK_SECRET — Stripe webhook endpoint secret (whsec_...); passed by the webhook controller, not read directly by the service

DB tables:

tenant_subscriptions: tenant_id (UUID PK or unique), status (text — 'free'|'active'|'past_due'|'canceled'), stripe_customer_id (text), stripe_subscription_id (text), current_period_end (timestamptz nullable), updated_at. Upserted on subscription lifecycle events.

Error types: None defined in the service. Stripe signature failures raise Error from stripe.webhooks.constructEvent; these propagate to the error handler as 400 responses.

OIDCService (A2A / OIDC Provider)

Note: src/services/OIDCService.ts does not exist as a standalone file — OIDC provider functionality is handled by the oidc-provider npm package, configured in src/app.ts and related route files. The service boundary for OIDC-related business logic is the DelegationService. Document the OIDC integration as follows.

Purpose: The OIDC/A2A subsystem provides agent-to-agent (A2A) delegation using the oidc-provider library (v9.7.x). The provider is mounted as a sub-application at /oidc and issues short-lived delegation tokens scoped to a specific delegatee_id. The DelegationService (src/services/DelegationService.ts) manages the delegation_chains table for auditing.

Key endpoints exposed by the OIDC provider:

POST /oidc/token — issues delegation tokens via client_credentials or custom grant
GET /oidc/.well-known/openid-configuration — OIDC discovery document
GET /oidc/jwks — public JWK Set for verifying delegation tokens

DelegationService public methods (from src/services/DelegationService.ts):

Method	Parameters	Returns	Description
`createDelegation`	`delegatorId: string, delegateeId: string, scope: string, expiresAt?: Date`	`Promise<IDelegationChain>`	Inserts a delegation chain record into `delegation_chains`. Validates both agents exist and are active.
`verifyDelegation`	`token: string, delegateeId: string`	`Promise<IDelegationVerifyResult>`	Verifies the delegation token signature and checks the chain record is active and not expired.
`revokeDelegation`	`chainId: string, delegatorId: string`	`Promise<void>`	Sets `delegation_chains.status = 'revoked'` and `revoked_at = NOW()`. Validates the delegator owns the chain.

DB tables:

delegation_chains: chain_id (UUID PK), delegator_id (UUID), delegatee_id (UUID), scope (text), status (active|revoked|expired), created_at, expires_at (nullable), revoked_at (nullable), token (text — the delegation JWT).

Configuration:

A2A_ENABLED — when set to 'false', A2A/delegation endpoints return 404
OIDC_ISSUER — issuer URL for the OIDC provider

44 KiB Raw Blame History

Service Deep Dives

AgentService

OAuth2Service

CredentialService

AuditService

VaultClient

OPA Policy Engine

Web Dashboard

Prometheus/Grafana Monitoring

AnalyticsService

TierService

ComplianceService

FederationService

DIDService

WebhookService

BillingService

OIDCService (A2A / OIDC Provider)

44 KiB

Raw Blame History