- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook) - developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added - engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview) - OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update - VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added - .gitignore: exclude session artifacts, build artifacts, and agent workspaces Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
46 KiB
Spec — WS1: DevOps Documentation Update
Change: phase-7-devops-field-trial
Workstream: WS1
Status: Approved
Written: 2026-04-04
Purpose
Specify exactly what must be updated in each docs/devops/ file to reflect the Phase 6 state
of the codebase. The existing docs were written during Phase 2. Phases 3–6 added 14 DB
migrations, new services, feature flags, Stripe billing, Prometheus metrics, Redis key patterns,
and the Next.js portal — none of which appear in the current docs.
The Developer implementing this spec must update each listed file precisely as described. No other content in those files should be changed unless explicitly stated.
File: docs/devops/environment-variables.md
Section: Required Variables — update DATABASE_URL description
Replace the sentence:
The application uses
pg.Poolwith this connection string. Connection pool size uses thepgdefault (10 connections).
With:
The application uses
pg.Poolwith this connection string. Pool sizing is controlled by the optionalDB_POOL_*variables documented below.
Section: Required Variables — add STRIPE_SECRET_KEY after DATABASE_URL
Insert a new required variable block:
Note on Billing:
STRIPE_SECRET_KEY,STRIPE_WEBHOOK_SECRET, andSTRIPE_PRICE_IDare required whenBILLING_ENABLED=true. For local development, setBILLING_ENABLED=falseand use placeholder values.
Section: Optional Variables — add all Phase 3–6 variables
Add each of the following variable blocks in this order, after the existing VAULT_MOUNT block
and before POLICY_DIR:
BILLING_ENABLED
| Required | No |
| Default | false |
| Values | true, false |
| Example | BILLING_ENABLED=false |
Gates Stripe billing integration and free-tier agent limit enforcement. When false, no Stripe
API calls are made and all tier limits are unenforced. Set to false for in-house testing.
STRIPE_SECRET_KEY
| Required | Only when BILLING_ENABLED=true |
| Format | Stripe secret key string (sk_live_* or sk_test_*) |
| Example | STRIPE_SECRET_KEY=sk_test_placeholder |
Stripe API key used to create Checkout Sessions for tier upgrades. Never use a live key in development.
STRIPE_WEBHOOK_SECRET
| Required | Only when BILLING_ENABLED=true |
| Format | Stripe webhook signing secret (whsec_*) |
| Example | STRIPE_WEBHOOK_SECRET=whsec_placeholder |
Used to verify the HMAC signature on incoming Stripe webhook events. Without this, the billing webhook endpoint will reject all events.
STRIPE_PRICE_ID
| Required | Only when BILLING_ENABLED=true |
| Format | Stripe Price ID string (price_*) |
| Example | STRIPE_PRICE_ID=price_placeholder |
The Stripe Price object used when creating a Checkout Session for the Pro tier upgrade.
ANALYTICS_ENABLED
| Required | No |
| Default | true |
| Values | true, false |
| Example | ANALYTICS_ENABLED=true |
Feature flag that gates the /api/v1/analytics/* routes. When false, the analytics router is
not mounted and all analytics endpoints return 404. Events are still recorded internally
regardless of this flag.
TIER_ENFORCEMENT
| Required | No |
| Default | true |
| Values | true, false |
| Example | TIER_ENFORCEMENT=true |
Enables Redis-backed tier limit enforcement per tenant. When true, the tierEnforcement
middleware checks daily API call and token counts against per-tier limits defined in
src/config/tiers.ts. Enterprise tenants with maxCallsPerDay: Infinity bypass enforcement.
When false, no tier limits are enforced.
COMPLIANCE_ENABLED
| Required | No |
| Default | true |
| Values | true, false |
| Example | COMPLIANCE_ENABLED=true |
Feature flag that gates the report and agent-card export endpoints under
/api/v1/compliance/*. When false, those endpoints return 404. The SOC2 controls endpoint
(/api/v1/compliance/controls) and audit chain verification (/api/v1/audit/verify) are
always enabled regardless of this flag.
REDIS_RATE_LIMIT_ENABLED
| Required | No |
| Default | false |
| Values | true, false |
| Example | REDIS_RATE_LIMIT_ENABLED=true |
When true, rate limiting uses a Redis-backed sliding-window counter per client_id. When
false, rate limiting uses an in-process RateLimiterMemory store (does not share state
across multiple app instances).
RATE_LIMIT_WINDOW_MS
| Required | No |
| Default | 60000 |
| Format | Integer (milliseconds) |
| Example | RATE_LIMIT_WINDOW_MS=60000 |
Duration of the sliding-window rate limit period in milliseconds. Only effective when
REDIS_RATE_LIMIT_ENABLED=true.
RATE_LIMIT_MAX_REQUESTS
| Required | No |
| Default | 100 |
| Format | Integer |
| Example | RATE_LIMIT_MAX_REQUESTS=100 |
Maximum number of requests allowed per client_id within RATE_LIMIT_WINDOW_MS. Requests
exceeding this limit receive 429 RATE_LIMIT_EXCEEDED.
DB_POOL_MAX
| Required | No |
| Default | 20 |
| Format | Integer |
| Example | DB_POOL_MAX=20 |
Maximum number of PostgreSQL connections in the pool. Increase for high-throughput production
deployments. Ensure your PostgreSQL instance's max_connections is set to at least
DB_POOL_MAX × number_of_app_instances + 5.
DB_POOL_MIN
| Required | No |
| Default | 2 |
| Format | Integer |
| Example | DB_POOL_MIN=2 |
Minimum number of idle connections kept alive in the pool.
DB_POOL_IDLE_TIMEOUT_MS
| Required | No |
| Default | 30000 |
| Format | Integer (milliseconds) |
| Example | DB_POOL_IDLE_TIMEOUT_MS=30000 |
Milliseconds a connection can sit idle before being evicted from the pool.
DB_POOL_CONNECTION_TIMEOUT_MS
| Required | No |
| Default | 5000 |
| Format | Integer (milliseconds) |
| Example | DB_POOL_CONNECTION_TIMEOUT_MS=5000 |
Milliseconds the pool waits for a connection to become available before throwing a connection timeout error.
VAULT_KV_MOUNT
| Required | No |
| Default | secret |
| Format | String (no leading or trailing slash) |
| Example | VAULT_KV_MOUNT=agentidp |
KV v2 secrets engine mount path used by VaultService. Equivalent to the existing VAULT_MOUNT
variable — note that .env.example uses VAULT_KV_MOUNT; the underlying service reads either.
OPA_URL
| Required | No |
| Format | URL string |
| Example | OPA_URL=http://localhost:8181 |
URL of a running OPA server for external policy evaluation. When unset, the application falls
back to the embedded Wasm or JSON policy in POLICY_DIR. Used for health check reporting.
KAFKA_BROKERS
| Required | No |
| Format | Comma-separated broker addresses |
| Example | KAFKA_BROKERS=localhost:9092 |
When set, the KafkaAdapter publishes domain events to Kafka. When unset, Kafka publishing is
disabled and events are only delivered via the WebhookService.
ENFORCE_TLS
| Required | No |
| Default | false |
| Values | true, false |
| Example | ENFORCE_TLS=true |
When true, the tlsEnforcementMiddleware redirects all HTTP requests to HTTPS. Enable in
production deployments where TLS termination is handled at the application layer.
Section: Complete .env Example — replace entirely
Replace the entire existing .env Example section with the following complete example that
reflects all Phase 1–6 variables:
# ── Server ──────────────────────────────────────────────────────────────────
NODE_ENV=development
PORT=3000
CORS_ORIGIN=http://localhost:3001
# ── Database ─────────────────────────────────────────────────────────────────
DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp
DB_POOL_MAX=20
DB_POOL_MIN=2
DB_POOL_IDLE_TIMEOUT_MS=30000
DB_POOL_CONNECTION_TIMEOUT_MS=5000
# ── Redis ────────────────────────────────────────────────────────────────────
REDIS_URL=redis://localhost:6379
REDIS_RATE_LIMIT_ENABLED=true
RATE_LIMIT_WINDOW_MS=60000
RATE_LIMIT_MAX_REQUESTS=100
# ── JWT Keys (generate with openssl — see docs/devops/security.md) ──────────
JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\nMIIEow...\n-----END RSA PRIVATE KEY-----"
JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\nMIIBIj...\n-----END PUBLIC KEY-----"
# ── Billing (Stripe) — set BILLING_ENABLED=false for local/in-house testing ─
BILLING_ENABLED=false
STRIPE_SECRET_KEY=sk_test_placeholder
STRIPE_WEBHOOK_SECRET=whsec_placeholder
STRIPE_PRICE_ID=price_placeholder
# ── Phase 6 Feature Flags ─────────────────────────────────────────────────────
ANALYTICS_ENABLED=true
TIER_ENFORCEMENT=true
COMPLIANCE_ENABLED=true
# ── HashiCorp Vault (optional) ────────────────────────────────────────────────
# VAULT_ADDR=http://127.0.0.1:8200
# VAULT_TOKEN=hvs.XXXXXXXXXXXXXXXXXXXXXX
# VAULT_KV_MOUNT=secret
# ── OPA (optional) ───────────────────────────────────────────────────────────
# POLICY_DIR=/etc/sentryagent/policies
# OPA_URL=http://localhost:8181
# ── Kafka (optional) ─────────────────────────────────────────────────────────
# KAFKA_BROKERS=localhost:9092
# ── TLS ──────────────────────────────────────────────────────────────────────
# ENFORCE_TLS=true
Section: Variable Validation at Startup — add note on feature flags
Append after the existing validation list:
Feature flags (
BILLING_ENABLED,ANALYTICS_ENABLED,TIER_ENFORCEMENT,COMPLIANCE_ENABLED) are read at startup.ANALYTICS_ENABLEDandCOMPLIANCE_ENABLEDdetermine whether their respective routers are mounted — changing these values requires a process restart.
File: docs/devops/database.md
Section: Schema Overview — replace diagram
Replace:
agents
└── credentials (FK: client_id → agents.agent_id, CASCADE DELETE)
audit_events (no FK — append-only, agent_id is informational)
token_revocations (no FK — independent revocation store)
With:
organizations
├── agents (FK: organization_id → organizations.org_id)
│ ├── credentials (FK: client_id → agents.agent_id, CASCADE DELETE)
│ └── agent_did_keys (FK: agent_id → agents.agent_id)
└── audit_events (FK: organization_id — informational, no cascade)
token_revocations (no FK — independent revocation store)
oidc_keys (standalone — OIDC signing key rotation)
federation_partners (standalone — cross-tenant identity)
webhook_subscriptions → webhook_deliveries (FK: subscription_id)
agent_marketplace (standalone — agent discovery catalog)
github_oidc_trust_policies (standalone — CI/CD trust)
billing (FK: org_id → organizations.org_id — one row per org)
delegation_chains (standalone — A2A delegation records)
analytics_events (FK: organization_id — append-only)
tenant_tiers (FK: org_id → organizations.org_id — one row per org)
Section: Tables — add new table entries
After the existing token_revocations table section, add the following new table definitions:
organizations
Created by migration 006_create_organizations_table.sql.
| Column | Type | Nullable | Description |
|---|---|---|---|
org_id |
UUID |
No | Primary key |
name |
VARCHAR(255) |
No | Organisation display name |
slug |
VARCHAR(64) |
No | URL-safe unique identifier |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
agent_did_keys
Created by migration 012_create_agent_did_keys_table.sql.
Stores the DID document key material for each agent. One agent may have multiple keys for rotation purposes.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
agent_id |
UUID |
No | FK → agents.agent_id |
key_id |
VARCHAR(255) |
No | DID key fragment identifier |
public_key_jwk |
JSONB |
No | Public key in JWK format |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
DID columns on agents
Added by migration 013_add_did_columns_to_agents.sql:
did—VARCHAR(512)nullable — thedid:webidentifier for this agentdid_document—JSONBnullable — full DID document
oidc_keys
Created by migration 014_create_oidc_keys_table.sql.
Stores RSA key pairs used for OIDC ID token signing. Supports key rotation — active key is determined by the most recently created row.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
kid |
VARCHAR(128) |
No | Key ID — referenced in JWKS |
private_key_pem |
TEXT |
No | Encrypted RSA private key (pgcrypto) |
public_key_pem |
TEXT |
No | RSA public key |
algorithm |
VARCHAR(16) |
No | Always RS256 |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
federation_partners
Created by migration 015_create_federation_partners_table.sql.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
org_id |
UUID |
No | Owning organisation |
partner_name |
VARCHAR(255) |
No | Display name |
partner_jwks_url |
TEXT |
No | URL to partner's JWKS endpoint |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
webhook_subscriptions
Created by migration 016_create_webhook_subscriptions_table.sql.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
org_id |
UUID |
No | Owning organisation |
event_type |
VARCHAR(128) |
No | Event type filter (e.g. agent.created) |
target_url |
TEXT |
No | HTTPS delivery endpoint |
secret |
VARCHAR(255) |
Yes | HMAC signing secret for delivery verification |
active |
BOOLEAN |
No | Default: true |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
webhook_deliveries
Created by migration 017_create_webhook_deliveries_table.sql.
Records each delivery attempt for a webhook event, including the dead-letter queue entries.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
subscription_id |
UUID |
No | FK → webhook_subscriptions.id |
event_type |
VARCHAR(128) |
No | Event type delivered |
payload |
JSONB |
No | Full event payload |
status |
VARCHAR(32) |
No | pending, delivered, failed, dead_letter |
response_status |
INTEGER |
Yes | HTTP status from delivery endpoint |
attempt_count |
INTEGER |
No | Default: 0 |
last_attempted_at |
TIMESTAMPTZ |
Yes | |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
Dead-letter queue: After 3 failed delivery attempts, the row status is set to dead_letter
and the agentidp_webhook_dead_letters_total Prometheus counter is incremented. The Prometheus
metric label is event_type.
pgcrypto extension
Enabled by migration 018_enable_pgcrypto.sql. Used for encrypting sensitive columns in
oidc_keys and credential data.
agent_marketplace
Created by migration 021_add_agent_marketplace.sql.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
agent_id |
UUID |
No | FK → agents.agent_id |
listing_name |
VARCHAR(255) |
No | Display name in marketplace |
description |
TEXT |
Yes | Markdown description |
tags |
TEXT[] |
No | Searchable tags. Default: {} |
published |
BOOLEAN |
No | Default: false |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
github_oidc_trust_policies
Created by migration 022_add_github_oidc_trust_policies.sql.
Maps GitHub Actions OIDC claims to agent identities for CI/CD token exchange.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
org_id |
UUID |
No | Owning organisation |
repository |
VARCHAR(512) |
No | GitHub repository slug (owner/repo) |
branch |
VARCHAR(255) |
Yes | Branch filter (null = any branch) |
agent_id |
UUID |
No | Agent to issue a token for on match |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
billing
Created by migration 023_add_billing.sql.
One row per organisation. Tracks the org's Stripe customer and subscription state.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
org_id |
UUID |
No | FK → organizations.org_id (UNIQUE) |
stripe_customer_id |
VARCHAR(255) |
Yes | Stripe Customer ID |
stripe_subscription_id |
VARCHAR(255) |
Yes | Stripe Subscription ID |
status |
VARCHAR(64) |
No | Stripe subscription status or none |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
delegation_chains
Created by migration 024_add_delegation_chains.sql.
Records A2A delegation grants created via the delegation API.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
delegator_agent_id |
UUID |
No | Agent granting the delegation |
delegate_agent_id |
UUID |
No | Agent receiving the delegation |
scopes |
TEXT[] |
No | Scopes being delegated |
expires_at |
TIMESTAMPTZ |
Yes | Optional expiry |
created_at |
TIMESTAMPTZ |
No | Default: NOW() |
analytics_events
Created by migration 025_add_analytics_events.sql.
Append-only event store for analytics. Supports token trend, agent activity, and usage summary queries.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
organization_id |
UUID |
No | Owning organisation |
date |
DATE |
No | Calendar date of the event (UTC) |
metric_type |
VARCHAR(64) |
No | e.g. token_issued, agent_called |
count |
INTEGER |
No | Event count for this date+type |
Index: (organization_id, date DESC) for fast time-series queries.
tenant_tiers
Created by migration 026_add_tenant_tiers.sql.
One row per organisation. Stores the current tier and enforces tier limits via the
tierEnforcement middleware.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
UUID |
No | Primary key |
org_id |
UUID |
No | FK → organizations.org_id (UNIQUE) |
tier |
ENUM('free','pro','enterprise') |
No | Current tier. Default: free |
updated_at |
TIMESTAMPTZ |
No | Last tier change. Default: NOW() |
Tier limits (from src/config/tiers.ts):
| Tier | Max Agents | Max API Calls/Day | Max Tokens/Day |
|---|---|---|---|
| free | 10 | 1,000 | 1,000 |
| pro | 100 | 50,000 | 50,000 |
| enterprise | unlimited | unlimited | unlimited |
Section: Migration Runner — update "Running migrations" example output
Replace the example output block (showing 4 migrations) with:
Running database migrations...
✓ Applied: 001_create_agents.sql
✓ Applied: 002_create_credentials.sql
...
✓ Applied: 025_add_analytics_events.sql
✓ Applied: 026_add_tenant_tiers.sql
Migrations complete. 26 migration(s) applied.
Replace the "Verifying applied migrations" expected output to show 26 rows rather than 4.
Section: Connection Pool — replace entirely
Replace:
The application uses
pg.Poolwith default settings (max 10 connections). The pool is a singleton — one pool per process instance.To override pool size, modify
src/db/pool.ts. In production, ensureDATABASE_URLincludes connection pool parameters if using PgBouncer or a managed connection pooler.
With:
The application uses
pg.Poolwith settings read from environment variables. The pool is a singleton — one pool per process instance.
Variable Default Description DB_POOL_MAX20Maximum connections DB_POOL_MIN2Minimum idle connections DB_POOL_IDLE_TIMEOUT_MS30000Idle eviction timeout (ms) DB_POOL_CONNECTION_TIMEOUT_MS5000Acquisition timeout (ms) Pool size is exposed as Prometheus metrics:
agentidp_db_pool_active_connectionsandagentidp_db_pool_waiting_requests. Monitor these in production to detect pool exhaustion.
File: docs/devops/architecture.md
Section: Component Overview — replace the ASCII diagram
Replace the existing ASCII diagram with:
┌───────────────────────────────────────────┐
│ Next.js Portal (port 3001) │
│ portal/ — Next.js 14 │
│ /login /agents /credentials /audit │
│ /analytics /settings/tier /compliance │
│ /webhooks /marketplace │
└────────────────┬──────────────────────────┘
│ HTTP (localhost:3000)
┌────────────────▼──────────────────────────┐
│ AgentIdP Application │
│ Node.js / Express (port 3000) │
│ │
│ TLS MW → Helmet → CORS → Morgan │
│ Metrics MW → OrgContext MW │
│ UsageMetering MW → TierEnforcement MW │
│ Auth MW → OPA MW → Routes │
│ ↓ │
│ Controllers → Services → Repos │
└──────────┬───────────────┬────────────────┘
│ │
┌────────────────▼──┐ ┌────────▼────────┐
│ PostgreSQL 14 │ │ Redis 7 │
│ Port 5432 │ │ Port 6379 │
│ │ │ │
│ 26 migrations │ │ Rate limits │
│ (001–026) │ │ Token revoke │
│ organizations │ │ Monthly counts │
│ agents + DID keys │ │ Tier counters │
│ credentials │ │ Compliance cache│
│ audit_events │ │ │
│ token_revocations │ └──────────────────┘
│ oidc_keys │
│ federation_partne-│ ┌──────────────────┐
│ rs │ │ HashiCorp Vault │
│ webhook_subscript-│ │ (optional) │
│ ions + deliveries │ │ KV v2 — creds │
│ agent_marketplace │ └──────────────────┘
│ github_oidc_trust │
│ billing │ ┌──────────────────┐
│ delegation_chains │ │ Stripe │
│ analytics_events │ │ (optional) │
│ tenant_tiers │ │ Billing/upgrades │
└────────────────────┘ └──────────────────┘
Section: Internal layers table — update
Replace:
| Layer | Responsibility |
|---|---|
| Routes | Wire HTTP methods and paths to controllers |
| Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) |
| Rate limit middleware | Redis sliding-window counter per client_id |
| Controllers | Parse and validate request, call service, return response |
| Services | Business logic — no direct DB access |
| Repositories | All SQL queries — no business logic |
| Utils | JWT sign/verify, bcrypt, error types, async handler |
With:
| Layer | Responsibility |
|---|---|
| Routes | Wire HTTP methods and paths to controllers |
| TLS middleware | Redirect HTTP → HTTPS when ENFORCE_TLS=true |
| Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) |
| OrgContext middleware | Resolve organization_id from JWT and attach to req |
| UsageMetering middleware | Fire-and-forget analytics event recording |
| TierEnforcement middleware | Enforce daily API call and token limits via Redis (when TIER_ENFORCEMENT=true) |
| OPA middleware | Scope-based authorization via embedded Wasm or JSON policy |
| Controllers | Parse and validate request, call service, return response |
| Services | Business logic — no direct DB access |
| Repositories | All SQL queries — no business logic |
| Utils | JWT sign/verify, bcrypt, error types, async handler |
Section: Service Map — replace entirely
Replace the existing 4-row service map table with the complete Phase 6 service map:
| Route prefix | Controller | Service(s) | Repository/ies |
|---|---|---|---|
/api/v1/agents |
AgentController |
AgentService |
AgentRepository |
/api/v1/credentials |
CredentialController |
CredentialService |
CredentialRepository |
/api/v1/token |
TokenController |
OAuth2Service |
TokenRepository, CredentialRepository, AgentRepository |
/api/v1/audit |
AuditController |
AuditService |
AuditRepository |
/api/v1/organizations |
OrgController |
OrgService |
OrgRepository |
/api/v1/compliance/* |
ComplianceController |
ComplianceService |
AuditRepository |
/api/v1/analytics/* |
AnalyticsController |
AnalyticsService |
direct pool queries |
/api/v1/tiers/* |
TierController |
TierService |
pool queries, Stripe SDK |
/api/v1/webhooks |
WebhookController |
WebhookService |
WebhookRepository |
/api/v1/federation |
FederationController |
FederationService |
direct pool queries |
/api/v1/marketplace |
MarketplaceController |
MarketplaceService |
direct pool queries |
/api/v1/billing |
BillingController |
BillingService |
direct pool queries |
/.well-known/did.json, /api/v1/did/* |
DIDController |
DIDService |
AgentRepository |
/.well-known/openid-configuration, /api/v1/oidc/* |
OIDCController |
OIDCKeyService, IDTokenService |
direct pool queries |
/api/v1/oidc/trust-policies |
OIDCTrustPolicyController |
OIDCTrustPolicyService |
direct pool queries |
/api/v1/delegation |
DelegationController |
DelegationService |
direct pool queries |
/api/v1/scaffold |
ScaffoldController |
ScaffoldService |
— |
/health |
inline | — | pool, redis |
/metrics |
inline | — | prom-client |
Section: Redis — update key patterns table
Replace the existing 3-row Redis key patterns table with:
| Key pattern | Example | Purpose | TTL |
|---|---|---|---|
revoked:<jti> |
revoked:f1e2d3c4-... |
Revoked token JTI | Remaining token lifetime |
rate:<client_id>:<window> |
rate:a1b2c3...:29086156 |
Request count per window | RATE_LIMIT_WINDOW_MS |
monthly:<client_id>:<year>:<month> |
monthly:a1b2c3...:2026:3 |
Monthly token issuance count | End of month |
rate:tier:calls:<tenantId> |
rate:tier:calls:org-uuid |
Daily API call counter for tier enforcement | Until midnight UTC |
rate:tier:tokens:<tenantId> |
rate:tier:tokens:org-uuid |
Daily token issuance counter for tier enforcement | Until midnight UTC |
compliance:report:<tenantId> |
compliance:report:org-uuid |
Cached compliance report JSON | 5 minutes |
Section: New Services — add after the existing component descriptions
Add a new ## New Services (Phases 3–6) section:
## New Services (Phases 3–6)
| Service | Source file | Responsibility |
|---------|------------|----------------|
| `AnalyticsService` | `src/services/AnalyticsService.ts` | Fire-and-forget `recordEvent`, time-series `getTokenTrend`, heatmap `getAgentActivity`, per-agent `getAgentUsageSummary` |
| `TierService` | `src/services/TierService.ts` | `getStatus` (reads `tenant_tiers`), `initiateUpgrade` (creates Stripe Checkout Session), `applyUpgrade` (handles Stripe webhook), `enforceAgentLimit` |
| `ComplianceService` | `src/services/ComplianceService.ts` | `generateReport` (Redis-cached 5 min), `exportAgentCards` (AGNTCY format) |
| `DelegationService` | `src/services/DelegationService.ts` | A2A delegation chain creation and verification |
| `DIDService` | `src/services/DIDService.ts` | `did:web` identifier generation and DID document management |
| `OIDCKeyService` | `src/services/OIDCKeyService.ts` | OIDC key rotation, JWKS endpoint |
| `IDTokenService` | `src/services/IDTokenService.ts` | OIDC ID token issuance |
| `FederationService` | `src/services/FederationService.ts` | Cross-tenant agent identity federation |
| `WebhookService` | `src/services/WebhookService.ts` | Event subscriptions, delivery with retry, dead-letter queue |
| `VaultService` | `src/services/VaultService.ts` | HashiCorp Vault KV v2 read/write for credential storage |
| `BillingService` | `src/services/BillingService.ts` | Stripe customer and subscription management |
| `MarketplaceService` | `src/services/MarketplaceService.ts` | Agent listing and discovery |
| `OIDCTrustPolicyService` | `src/services/OIDCTrustPolicyService.ts` | GitHub OIDC trust policy management |
| `EventPublisher` | `src/services/EventPublisher.ts` | Routes domain events to webhook delivery and Kafka (if configured) |
Section: Ports — update table
Replace:
| Service | Internal port | Exposed port (local dev) |
|---|---|---|
| AgentIdP app | 3000 | 3000 |
| PostgreSQL | 5432 | 5432 |
| Redis | 6379 | 6379 |
With:
| Service | Internal port | Exposed port (local dev) |
|---|---|---|
| AgentIdP app | 3000 | 3000 |
| Next.js portal | 3001 | 3001 |
| PostgreSQL | 5432 | 5432 |
| Redis | 6379 | 6379 |
Section: Add new section — API Routes
Add at the end of the file:
## API Routes (Phase 6 complete)
Base path: `/api/v1`
| Route | Method(s) | Auth | Feature flag |
|-------|----------|------|-------------|
| `/api/v1/agents` | GET, POST, PATCH, DELETE | Bearer JWT | always on |
| `/api/v1/credentials` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/token` | POST | none (client credentials) | always on |
| `/api/v1/audit` | GET | Bearer JWT | always on |
| `/api/v1/audit/verify` | GET | Bearer JWT | always on |
| `/api/v1/organizations` | GET, POST | Bearer JWT | always on |
| `/api/v1/compliance/controls` | GET | none | always on |
| `/api/v1/compliance/report` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/compliance/agent-cards` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/analytics/token-trend` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/agent-activity` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/usage-summary` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/tiers/status` | GET | Bearer JWT | always on |
| `/api/v1/tiers/upgrade` | POST | Bearer JWT | always on |
| `/api/v1/webhooks` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/federation` | GET, POST | Bearer JWT | always on |
| `/api/v1/delegation` | GET, POST | Bearer JWT | always on |
| `/api/v1/marketplace` | GET | none | always on |
| `/api/v1/billing` | GET, POST | Bearer JWT | always on |
| `/api/v1/did/*` | GET | none | always on |
| `/api/v1/oidc/*` | GET, POST | mixed | always on |
| `/.well-known/openid-configuration` | GET | none | always on |
| `/.well-known/jwks.json` | GET | none | always on |
| `/.well-known/did.json` | GET | none | always on |
| `/health` | GET | none | always on |
| `/metrics` | GET | none | always on |
File: docs/devops/local-development.md
Section: Prerequisites table — update
Replace the existing 3-row prerequisites table with:
| Tool | Minimum version | Purpose |
|---|---|---|
| Docker | 24+ | Container runtime |
| Docker Compose | 2.20+ | Multi-container orchestration |
| Node.js | 18.0.0 | Run the application, portal, and migrations |
| npm | 9+ | Package management and scripts |
| nvm | any | Recommended for managing Node.js versions |
| openssl | any | RSA key generation |
Add after the table:
nvm activation: If using nvm, activate it before running any Node.js commands:
export NVM_DIR="$HOME/.nvm" && source "$NVM_DIR/nvm.sh"
Section: Step 1 — clone and install — update
After npm install (which installs the backend), add:
# Install portal dependencies
cd portal && npm install && cd ..
Section: Step 4 — Start infrastructure services — update the note
Replace:
The
appservice indocker-compose.ymlrequires aDockerfilewhich has not been written yet. This is a Phase 1 P1 pending item. The commands below will work once the Dockerfile exists.
With:
The full Docker Compose stack (including the
appcontainer) is available for field trial deployments — see the field trial guide. For day-to-day development, start only the infrastructure services and run the application directly.
Section: Step 5 — Run database migrations — update expected output
Replace the expected output showing 4 migrations with:
Running database migrations...
✓ Applied: 001_create_agents.sql
...
✓ Applied: 026_add_tenant_tiers.sql
Migrations complete. 26 migration(s) applied.
Section: Add new Step 7 — Start the Next.js portal
Add after Step 6 (Start the application):
## Step 7 — Start the Next.js portal (optional)
The portal is a Next.js 14 application in the `portal/` directory. It communicates with the
AgentIdP backend at `http://localhost:3000`.
Start the portal development server:
```bash
cd portal && npm run dev
The portal starts on port 3001 by default. Open http://localhost:3001.
Available routes:
| Route | Description |
|---|---|
/login |
OAuth 2.0 login page |
/agents |
Agent registry |
/credentials |
Credential management |
/audit |
Audit log viewer |
/analytics |
Token trend and agent activity charts |
/settings/tier |
Tier status and upgrade |
/compliance |
AGNTCY compliance report |
/webhooks |
Webhook subscription management |
/marketplace |
Agent marketplace |
Build the portal for production:
cd portal && npm run build
cd portal && npm start # serves the production build
Ensure CORS_ORIGIN in your .env includes http://localhost:3001:
CORS_ORIGIN=http://localhost:3001
---
## File: `docs/devops/operations.md`
### Section: Startup checklist — update
Replace the existing 4-step checklist with a checklist that reflects Docker Compose full-stack
operation and includes the portal:
```bash
# 1. Start the full stack
docker compose up --build -d
# 2. Verify all three services are healthy
docker compose ps
# app, postgres, and redis must all show "healthy"
# 3. Run migrations
docker compose exec app npm run db:migrate
# 4. Verify application health
curl http://localhost:3000/health
# Expected: {"status":"ok"}
# 5. (Optional) Start the portal for local dev
cd portal && npm run dev
Section: Redis Key Patterns — update table
Replace the 3-row table with the complete 6-row table (same as the architecture.md update above):
| Key pattern | Example | Purpose | TTL |
|---|---|---|---|
revoked:<jti> |
revoked:f1e2d3c4-... |
Revoked token JTI | Remaining token lifetime |
rate:<client_id>:<window> |
rate:a1b2c3...:29086156 |
Request count per window | RATE_LIMIT_WINDOW_MS |
monthly:<client_id>:<year>:<month> |
monthly:a1b2c3...:2026:3 |
Monthly token issuance count | End of month |
rate:tier:calls:<tenantId> |
rate:tier:calls:org-uuid |
Daily API call counter for tier enforcement | Until midnight UTC |
rate:tier:tokens:<tenantId> |
rate:tier:tokens:org-uuid |
Daily token issuance counter for tier enforcement | Until midnight UTC |
compliance:report:<tenantId> |
compliance:report:org-uuid |
Cached compliance report JSON | 5 minutes |
Add the following new inspection commands at the end of the "Inspect keys" section:
# Check tier API call counter for a tenant
redis-cli GET "rate:tier:calls:<org_id>"
# Check tier token counter for a tenant
redis-cli GET "rate:tier:tokens:<org_id>"
# Check cached compliance report for a tenant
redis-cli GET "compliance:report:<org_id>"
redis-cli TTL "compliance:report:<org_id>"
Section: Monitoring — update Metrics Exposed table
Replace the existing 6-row metrics table with the complete 19-metric table:
| Metric | Type | Labels | Description |
|---|---|---|---|
agentidp_tokens_issued_total |
Counter | scope |
OAuth 2.0 tokens issued |
agentidp_agents_registered_total |
Counter | deployment_env |
Agents registered |
agentidp_http_requests_total |
Counter | method, route, status_code |
HTTP requests |
agentidp_http_request_duration_seconds |
Histogram | method, route, status_code |
HTTP latency |
agentidp_db_query_duration_seconds |
Histogram | operation |
PostgreSQL query duration |
agentidp_redis_command_duration_seconds |
Histogram | command |
Redis command duration |
agentidp_webhook_dead_letters_total |
Counter | event_type |
Webhook deliveries moved to dead-letter queue |
agentidp_credentials_expiring_soon_total |
Gauge | — | Credentials expiring within 7 days |
agentidp_audit_chain_integrity |
Gauge | — | 1 if audit chain is intact, 0 if broken |
agentidp_rate_limit_hits_total |
Counter | client_id |
Rate limit rejections |
agentidp_db_pool_active_connections |
Gauge | — | Active PostgreSQL connections |
agentidp_db_pool_waiting_requests |
Gauge | — | Requests waiting for a pool connection |
agentidp_tenant_api_calls_total |
Counter | org_id, tier |
API calls per tenant per tier |
agentidp_billing_limit_rejections_total |
Counter | org_id, limit_type |
Tier limit enforcement rejections |
agentidp_did_documents_generated_total |
Counter | — | DID documents generated |
agentidp_oidc_tokens_issued_total |
Counter | — | OIDC ID tokens issued |
agentidp_federation_events_total |
Counter | event_type |
Federation partner events |
agentidp_delegation_chains_created_total |
Counter | — | A2A delegation chains created |
agentidp_compliance_reports_generated_total |
Counter | — | Compliance reports generated |
Section: Troubleshooting — add new entries
Append the following troubleshooting entries:
Tier limit rejected — 429 with tier_limit_exceeded code
Symptom: 429 TOO_MANY_REQUESTS with body {"code":"tier_limit_exceeded","message":"..."}
Check the tenant's current tier counter:
# Check API call counter
docker compose exec redis redis-cli GET "rate:tier:calls:<org_id>"
# Check the tenant's tier
psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '<org_id>';"
If the org is on the free tier and has hit 1,000 calls/day, upgrade the tier or wait until
midnight UTC for the counter to reset.
Analytics endpoints return 404
Cause: ANALYTICS_ENABLED is set to false in .env.
Fix: Set ANALYTICS_ENABLED=true and restart the application.
Compliance report returns 404
Cause: COMPLIANCE_ENABLED is set to false in .env.
Fix: Set COMPLIANCE_ENABLED=true and restart the application.
Portal CORS error
Symptom: Browser console shows Access-Control-Allow-Origin error on requests to
http://localhost:3000.
Fix: Ensure CORS_ORIGIN in .env includes http://localhost:3001:
CORS_ORIGIN=http://localhost:3001
Restart the application after changing this variable.
File: docs/devops/deployment.md
Section: Environment Variable Reference (Section 6) — update Quick Reference table
Add the following rows to the existing quick reference table:
| Variable | Required | Source (AWS) | Source (GCP) |
|---|---|---|---|
BILLING_ENABLED |
No | Task definition env var | Cloud Run env var |
STRIPE_SECRET_KEY |
Only if billing enabled | Secrets Manager: /<project>/<env>/stripe-secret-key |
Secret Manager: <name-prefix>-stripe-secret-key |
STRIPE_WEBHOOK_SECRET |
Only if billing enabled | Secrets Manager: /<project>/<env>/stripe-webhook-secret |
Secret Manager: <name-prefix>-stripe-webhook-secret |
STRIPE_PRICE_ID |
Only if billing enabled | Task definition env var | Cloud Run env var |
ANALYTICS_ENABLED |
No | Task definition env var (default: true) |
Cloud Run env var |
TIER_ENFORCEMENT |
No | Task definition env var (default: true) |
Cloud Run env var |
COMPLIANCE_ENABLED |
No | Task definition env var (default: true) |
Cloud Run env var |
REDIS_RATE_LIMIT_ENABLED |
No | Task definition env var | Cloud Run env var |
RATE_LIMIT_WINDOW_MS |
No | Task definition env var (default: 60000) |
Cloud Run env var |
RATE_LIMIT_MAX_REQUESTS |
No | Task definition env var (default: 100) |
Cloud Run env var |
DB_POOL_MAX |
No | Task definition env var (default: 20) |
Cloud Run env var |
DB_POOL_MIN |
No | Task definition env var (default: 2) |
Cloud Run env var |
DB_POOL_IDLE_TIMEOUT_MS |
No | Task definition env var (default: 30000) |
Cloud Run env var |
DB_POOL_CONNECTION_TIMEOUT_MS |
No | Task definition env var (default: 5000) |
Cloud Run env var |
KAFKA_BROKERS |
No | Task definition env var | Cloud Run env var |
ENFORCE_TLS |
No | Task definition env var | Cloud Run env var |
OPA_URL |
No | Task definition env var | Cloud Run env var |
VAULT_KV_MOUNT |
No | Task definition env var (default: secret) |
Cloud Run env var |
Section: Step 2.8 / Step 3.7 — Run Database Migrations — update migration count
In the migration command output examples in sections 2.8 and 3.7, update migration count references from "4 migration(s)" to "26 migration(s)".
File: docs/devops/security.md
No structural changes required. Append the following note at the end of the "JWT Key Management" section:
OIDC keys are separate from the main JWT keys. OIDC signing keys are stored in the
oidc_keysPostgreSQL table (created by migration014_create_oidc_keys_table.sql), encrypted at rest using pgcrypto (enabled by migration018_enable_pgcrypto.sql). TheOIDCKeyServicemanages rotation. OIDC keys do not need to be set as environment variables — they are provisioned automatically on first startup.
File: docs/devops/vault-setup.md
Section: Add note on VAULT_KV_MOUNT alias
After the VAULT_MOUNT variable description, add:
Note: The
.env.examplefile usesVAULT_KV_MOUNTas the variable name. The application reads bothVAULT_KV_MOUNTandVAULT_MOUNT— preferVAULT_KV_MOUNTin new configurations for consistency with the current.env.example.
File: docs/devops/README.md
Section: Document index — add field-trial.md entry
Add field-trial.md to the document index table:
| Document | Audience | Contents |
|---|---|---|
| ... existing entries ... | ||
| field-trial.md | DevOps engineers, QA | In-house Docker Compose field trial execution playbook |
Acceptance Criteria
environment-variables.mddocuments all 11 new variables from Phases 3–6environment-variables.mdcomplete.envexample includes all Phase 6 flagsdatabase.mdschema overview reflects all 26 migrations (001–026)database.mddocuments all 10 new tables added in Phases 3–6database.mdconnection pool section referencesDB_POOL_*env varsarchitecture.mddiagram shows Next.js portal at port 3001architecture.mdservice map covers all 19 route prefixesarchitecture.mdRedis table covers all 6 key patternsarchitecture.mdnew services section documents all 13 Phase 3–6 servicesarchitecture.mdAPI routes section covers all 25 routeslocal-development.mdincludes portal setup (Step 7) with all 9 portal routesoperations.mdstartup checklist usesdocker compose(notdocker-compose)operations.mdRedis table covers all 6 key patterns with correct TTLsoperations.mdmetrics table covers all 19 Prometheus metricsoperations.mdtroubleshooting covers tier limits, feature flag 404s, portal CORSdeployment.mdvariable quick reference includes all Phase 3–6 variablessecurity.mdnote on OIDC keys addedvault-setup.mdnote onVAULT_KV_MOUNTalias addedREADME.mdindex includesfield-trial.md