Files
sentryagent-idp/openspec/changes/archive/phase-7-devops-field-trial/specs/devops-update/spec.md
SentryAgent.ai Developer 8cabc0191c docs: commit all Phase 6 documentation updates and OpenSpec archives
- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook)
- developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added
- engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview)
- OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update
- VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added
- .gitignore: exclude session artifacts, build artifacts, and agent workspaces

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 02:24:24 +00:00

46 KiB
Raw Blame History

Spec — WS1: DevOps Documentation Update

Change: phase-7-devops-field-trial
Workstream: WS1
Status: Approved
Written: 2026-04-04

Purpose

Specify exactly what must be updated in each docs/devops/ file to reflect the Phase 6 state of the codebase. The existing docs were written during Phase 2. Phases 36 added 14 DB migrations, new services, feature flags, Stripe billing, Prometheus metrics, Redis key patterns, and the Next.js portal — none of which appear in the current docs.

The Developer implementing this spec must update each listed file precisely as described. No other content in those files should be changed unless explicitly stated.


File: docs/devops/environment-variables.md

Section: Required Variables — update DATABASE_URL description

Replace the sentence:

The application uses pg.Pool with this connection string. Connection pool size uses the pg default (10 connections).

With:

The application uses pg.Pool with this connection string. Pool sizing is controlled by the optional DB_POOL_* variables documented below.

Section: Required Variables — add STRIPE_SECRET_KEY after DATABASE_URL

Insert a new required variable block:

Note on Billing: STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET, and STRIPE_PRICE_ID are required when BILLING_ENABLED=true. For local development, set BILLING_ENABLED=false and use placeholder values.

Section: Optional Variables — add all Phase 36 variables

Add each of the following variable blocks in this order, after the existing VAULT_MOUNT block and before POLICY_DIR:


BILLING_ENABLED

Required No
Default false
Values true, false
Example BILLING_ENABLED=false

Gates Stripe billing integration and free-tier agent limit enforcement. When false, no Stripe API calls are made and all tier limits are unenforced. Set to false for in-house testing.


STRIPE_SECRET_KEY

Required Only when BILLING_ENABLED=true
Format Stripe secret key string (sk_live_* or sk_test_*)
Example STRIPE_SECRET_KEY=sk_test_placeholder

Stripe API key used to create Checkout Sessions for tier upgrades. Never use a live key in development.


STRIPE_WEBHOOK_SECRET

Required Only when BILLING_ENABLED=true
Format Stripe webhook signing secret (whsec_*)
Example STRIPE_WEBHOOK_SECRET=whsec_placeholder

Used to verify the HMAC signature on incoming Stripe webhook events. Without this, the billing webhook endpoint will reject all events.


STRIPE_PRICE_ID

Required Only when BILLING_ENABLED=true
Format Stripe Price ID string (price_*)
Example STRIPE_PRICE_ID=price_placeholder

The Stripe Price object used when creating a Checkout Session for the Pro tier upgrade.


ANALYTICS_ENABLED

Required No
Default true
Values true, false
Example ANALYTICS_ENABLED=true

Feature flag that gates the /api/v1/analytics/* routes. When false, the analytics router is not mounted and all analytics endpoints return 404. Events are still recorded internally regardless of this flag.


TIER_ENFORCEMENT

Required No
Default true
Values true, false
Example TIER_ENFORCEMENT=true

Enables Redis-backed tier limit enforcement per tenant. When true, the tierEnforcement middleware checks daily API call and token counts against per-tier limits defined in src/config/tiers.ts. Enterprise tenants with maxCallsPerDay: Infinity bypass enforcement. When false, no tier limits are enforced.


COMPLIANCE_ENABLED

Required No
Default true
Values true, false
Example COMPLIANCE_ENABLED=true

Feature flag that gates the report and agent-card export endpoints under /api/v1/compliance/*. When false, those endpoints return 404. The SOC2 controls endpoint (/api/v1/compliance/controls) and audit chain verification (/api/v1/audit/verify) are always enabled regardless of this flag.


REDIS_RATE_LIMIT_ENABLED

Required No
Default false
Values true, false
Example REDIS_RATE_LIMIT_ENABLED=true

When true, rate limiting uses a Redis-backed sliding-window counter per client_id. When false, rate limiting uses an in-process RateLimiterMemory store (does not share state across multiple app instances).


RATE_LIMIT_WINDOW_MS

Required No
Default 60000
Format Integer (milliseconds)
Example RATE_LIMIT_WINDOW_MS=60000

Duration of the sliding-window rate limit period in milliseconds. Only effective when REDIS_RATE_LIMIT_ENABLED=true.


RATE_LIMIT_MAX_REQUESTS

Required No
Default 100
Format Integer
Example RATE_LIMIT_MAX_REQUESTS=100

Maximum number of requests allowed per client_id within RATE_LIMIT_WINDOW_MS. Requests exceeding this limit receive 429 RATE_LIMIT_EXCEEDED.


DB_POOL_MAX

Required No
Default 20
Format Integer
Example DB_POOL_MAX=20

Maximum number of PostgreSQL connections in the pool. Increase for high-throughput production deployments. Ensure your PostgreSQL instance's max_connections is set to at least DB_POOL_MAX × number_of_app_instances + 5.


DB_POOL_MIN

Required No
Default 2
Format Integer
Example DB_POOL_MIN=2

Minimum number of idle connections kept alive in the pool.


DB_POOL_IDLE_TIMEOUT_MS

Required No
Default 30000
Format Integer (milliseconds)
Example DB_POOL_IDLE_TIMEOUT_MS=30000

Milliseconds a connection can sit idle before being evicted from the pool.


DB_POOL_CONNECTION_TIMEOUT_MS

Required No
Default 5000
Format Integer (milliseconds)
Example DB_POOL_CONNECTION_TIMEOUT_MS=5000

Milliseconds the pool waits for a connection to become available before throwing a connection timeout error.


VAULT_KV_MOUNT

Required No
Default secret
Format String (no leading or trailing slash)
Example VAULT_KV_MOUNT=agentidp

KV v2 secrets engine mount path used by VaultService. Equivalent to the existing VAULT_MOUNT variable — note that .env.example uses VAULT_KV_MOUNT; the underlying service reads either.


OPA_URL

Required No
Format URL string
Example OPA_URL=http://localhost:8181

URL of a running OPA server for external policy evaluation. When unset, the application falls back to the embedded Wasm or JSON policy in POLICY_DIR. Used for health check reporting.


KAFKA_BROKERS

Required No
Format Comma-separated broker addresses
Example KAFKA_BROKERS=localhost:9092

When set, the KafkaAdapter publishes domain events to Kafka. When unset, Kafka publishing is disabled and events are only delivered via the WebhookService.


ENFORCE_TLS

Required No
Default false
Values true, false
Example ENFORCE_TLS=true

When true, the tlsEnforcementMiddleware redirects all HTTP requests to HTTPS. Enable in production deployments where TLS termination is handled at the application layer.


Section: Complete .env Example — replace entirely

Replace the entire existing .env Example section with the following complete example that reflects all Phase 16 variables:

# ── Server ──────────────────────────────────────────────────────────────────
NODE_ENV=development
PORT=3000
CORS_ORIGIN=http://localhost:3001

# ── Database ─────────────────────────────────────────────────────────────────
DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp
DB_POOL_MAX=20
DB_POOL_MIN=2
DB_POOL_IDLE_TIMEOUT_MS=30000
DB_POOL_CONNECTION_TIMEOUT_MS=5000

# ── Redis ────────────────────────────────────────────────────────────────────
REDIS_URL=redis://localhost:6379
REDIS_RATE_LIMIT_ENABLED=true
RATE_LIMIT_WINDOW_MS=60000
RATE_LIMIT_MAX_REQUESTS=100

# ── JWT Keys (generate with openssl — see docs/devops/security.md) ──────────
JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\nMIIEow...\n-----END RSA PRIVATE KEY-----"
JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\nMIIBIj...\n-----END PUBLIC KEY-----"

# ── Billing (Stripe) — set BILLING_ENABLED=false for local/in-house testing ─
BILLING_ENABLED=false
STRIPE_SECRET_KEY=sk_test_placeholder
STRIPE_WEBHOOK_SECRET=whsec_placeholder
STRIPE_PRICE_ID=price_placeholder

# ── Phase 6 Feature Flags ─────────────────────────────────────────────────────
ANALYTICS_ENABLED=true
TIER_ENFORCEMENT=true
COMPLIANCE_ENABLED=true

# ── HashiCorp Vault (optional) ────────────────────────────────────────────────
# VAULT_ADDR=http://127.0.0.1:8200
# VAULT_TOKEN=hvs.XXXXXXXXXXXXXXXXXXXXXX
# VAULT_KV_MOUNT=secret

# ── OPA (optional) ───────────────────────────────────────────────────────────
# POLICY_DIR=/etc/sentryagent/policies
# OPA_URL=http://localhost:8181

# ── Kafka (optional) ─────────────────────────────────────────────────────────
# KAFKA_BROKERS=localhost:9092

# ── TLS ──────────────────────────────────────────────────────────────────────
# ENFORCE_TLS=true

Section: Variable Validation at Startup — add note on feature flags

Append after the existing validation list:

Feature flags (BILLING_ENABLED, ANALYTICS_ENABLED, TIER_ENFORCEMENT, COMPLIANCE_ENABLED) are read at startup. ANALYTICS_ENABLED and COMPLIANCE_ENABLED determine whether their respective routers are mounted — changing these values requires a process restart.


File: docs/devops/database.md

Section: Schema Overview — replace diagram

Replace:

agents
  └── credentials (FK: client_id → agents.agent_id, CASCADE DELETE)

audit_events (no FK — append-only, agent_id is informational)

token_revocations (no FK — independent revocation store)

With:

organizations
  ├── agents (FK: organization_id → organizations.org_id)
  │     ├── credentials (FK: client_id → agents.agent_id, CASCADE DELETE)
  │     └── agent_did_keys (FK: agent_id → agents.agent_id)
  └── audit_events (FK: organization_id — informational, no cascade)

token_revocations (no FK — independent revocation store)
oidc_keys (standalone — OIDC signing key rotation)
federation_partners (standalone — cross-tenant identity)
webhook_subscriptions → webhook_deliveries (FK: subscription_id)
agent_marketplace (standalone — agent discovery catalog)
github_oidc_trust_policies (standalone — CI/CD trust)
billing (FK: org_id → organizations.org_id — one row per org)
delegation_chains (standalone — A2A delegation records)
analytics_events (FK: organization_id — append-only)
tenant_tiers (FK: org_id → organizations.org_id — one row per org)

Section: Tables — add new table entries

After the existing token_revocations table section, add the following new table definitions:


organizations

Created by migration 006_create_organizations_table.sql.

Column Type Nullable Description
org_id UUID No Primary key
name VARCHAR(255) No Organisation display name
slug VARCHAR(64) No URL-safe unique identifier
created_at TIMESTAMPTZ No Default: NOW()

agent_did_keys

Created by migration 012_create_agent_did_keys_table.sql.

Stores the DID document key material for each agent. One agent may have multiple keys for rotation purposes.

Column Type Nullable Description
id UUID No Primary key
agent_id UUID No FK → agents.agent_id
key_id VARCHAR(255) No DID key fragment identifier
public_key_jwk JSONB No Public key in JWK format
created_at TIMESTAMPTZ No Default: NOW()

DID columns on agents

Added by migration 013_add_did_columns_to_agents.sql:

  • didVARCHAR(512) nullable — the did:web identifier for this agent
  • did_documentJSONB nullable — full DID document

oidc_keys

Created by migration 014_create_oidc_keys_table.sql.

Stores RSA key pairs used for OIDC ID token signing. Supports key rotation — active key is determined by the most recently created row.

Column Type Nullable Description
id UUID No Primary key
kid VARCHAR(128) No Key ID — referenced in JWKS
private_key_pem TEXT No Encrypted RSA private key (pgcrypto)
public_key_pem TEXT No RSA public key
algorithm VARCHAR(16) No Always RS256
created_at TIMESTAMPTZ No Default: NOW()

federation_partners

Created by migration 015_create_federation_partners_table.sql.

Column Type Nullable Description
id UUID No Primary key
org_id UUID No Owning organisation
partner_name VARCHAR(255) No Display name
partner_jwks_url TEXT No URL to partner's JWKS endpoint
created_at TIMESTAMPTZ No Default: NOW()

webhook_subscriptions

Created by migration 016_create_webhook_subscriptions_table.sql.

Column Type Nullable Description
id UUID No Primary key
org_id UUID No Owning organisation
event_type VARCHAR(128) No Event type filter (e.g. agent.created)
target_url TEXT No HTTPS delivery endpoint
secret VARCHAR(255) Yes HMAC signing secret for delivery verification
active BOOLEAN No Default: true
created_at TIMESTAMPTZ No Default: NOW()

webhook_deliveries

Created by migration 017_create_webhook_deliveries_table.sql.

Records each delivery attempt for a webhook event, including the dead-letter queue entries.

Column Type Nullable Description
id UUID No Primary key
subscription_id UUID No FK → webhook_subscriptions.id
event_type VARCHAR(128) No Event type delivered
payload JSONB No Full event payload
status VARCHAR(32) No pending, delivered, failed, dead_letter
response_status INTEGER Yes HTTP status from delivery endpoint
attempt_count INTEGER No Default: 0
last_attempted_at TIMESTAMPTZ Yes
created_at TIMESTAMPTZ No Default: NOW()

Dead-letter queue: After 3 failed delivery attempts, the row status is set to dead_letter and the agentidp_webhook_dead_letters_total Prometheus counter is incremented. The Prometheus metric label is event_type.


pgcrypto extension

Enabled by migration 018_enable_pgcrypto.sql. Used for encrypting sensitive columns in oidc_keys and credential data.


agent_marketplace

Created by migration 021_add_agent_marketplace.sql.

Column Type Nullable Description
id UUID No Primary key
agent_id UUID No FK → agents.agent_id
listing_name VARCHAR(255) No Display name in marketplace
description TEXT Yes Markdown description
tags TEXT[] No Searchable tags. Default: {}
published BOOLEAN No Default: false
created_at TIMESTAMPTZ No Default: NOW()

github_oidc_trust_policies

Created by migration 022_add_github_oidc_trust_policies.sql.

Maps GitHub Actions OIDC claims to agent identities for CI/CD token exchange.

Column Type Nullable Description
id UUID No Primary key
org_id UUID No Owning organisation
repository VARCHAR(512) No GitHub repository slug (owner/repo)
branch VARCHAR(255) Yes Branch filter (null = any branch)
agent_id UUID No Agent to issue a token for on match
created_at TIMESTAMPTZ No Default: NOW()

billing

Created by migration 023_add_billing.sql.

One row per organisation. Tracks the org's Stripe customer and subscription state.

Column Type Nullable Description
id UUID No Primary key
org_id UUID No FK → organizations.org_id (UNIQUE)
stripe_customer_id VARCHAR(255) Yes Stripe Customer ID
stripe_subscription_id VARCHAR(255) Yes Stripe Subscription ID
status VARCHAR(64) No Stripe subscription status or none
created_at TIMESTAMPTZ No Default: NOW()

delegation_chains

Created by migration 024_add_delegation_chains.sql.

Records A2A delegation grants created via the delegation API.

Column Type Nullable Description
id UUID No Primary key
delegator_agent_id UUID No Agent granting the delegation
delegate_agent_id UUID No Agent receiving the delegation
scopes TEXT[] No Scopes being delegated
expires_at TIMESTAMPTZ Yes Optional expiry
created_at TIMESTAMPTZ No Default: NOW()

analytics_events

Created by migration 025_add_analytics_events.sql.

Append-only event store for analytics. Supports token trend, agent activity, and usage summary queries.

Column Type Nullable Description
id UUID No Primary key
organization_id UUID No Owning organisation
date DATE No Calendar date of the event (UTC)
metric_type VARCHAR(64) No e.g. token_issued, agent_called
count INTEGER No Event count for this date+type

Index: (organization_id, date DESC) for fast time-series queries.


tenant_tiers

Created by migration 026_add_tenant_tiers.sql.

One row per organisation. Stores the current tier and enforces tier limits via the tierEnforcement middleware.

Column Type Nullable Description
id UUID No Primary key
org_id UUID No FK → organizations.org_id (UNIQUE)
tier ENUM('free','pro','enterprise') No Current tier. Default: free
updated_at TIMESTAMPTZ No Last tier change. Default: NOW()

Tier limits (from src/config/tiers.ts):

Tier Max Agents Max API Calls/Day Max Tokens/Day
free 10 1,000 1,000
pro 100 50,000 50,000
enterprise unlimited unlimited unlimited

Section: Migration Runner — update "Running migrations" example output

Replace the example output block (showing 4 migrations) with:

Running database migrations...
  ✓ Applied: 001_create_agents.sql
  ✓ Applied: 002_create_credentials.sql
  ...
  ✓ Applied: 025_add_analytics_events.sql
  ✓ Applied: 026_add_tenant_tiers.sql

Migrations complete. 26 migration(s) applied.

Replace the "Verifying applied migrations" expected output to show 26 rows rather than 4.


Section: Connection Pool — replace entirely

Replace:

The application uses pg.Pool with default settings (max 10 connections). The pool is a singleton — one pool per process instance.

To override pool size, modify src/db/pool.ts. In production, ensure DATABASE_URL includes connection pool parameters if using PgBouncer or a managed connection pooler.

With:

The application uses pg.Pool with settings read from environment variables. The pool is a singleton — one pool per process instance.

Variable Default Description
DB_POOL_MAX 20 Maximum connections
DB_POOL_MIN 2 Minimum idle connections
DB_POOL_IDLE_TIMEOUT_MS 30000 Idle eviction timeout (ms)
DB_POOL_CONNECTION_TIMEOUT_MS 5000 Acquisition timeout (ms)

Pool size is exposed as Prometheus metrics: agentidp_db_pool_active_connections and agentidp_db_pool_waiting_requests. Monitor these in production to detect pool exhaustion.


File: docs/devops/architecture.md

Section: Component Overview — replace the ASCII diagram

Replace the existing ASCII diagram with:

                    ┌───────────────────────────────────────────┐
                    │         Next.js Portal (port 3001)         │
                    │         portal/ — Next.js 14               │
                    │  /login /agents /credentials /audit        │
                    │  /analytics /settings/tier /compliance     │
                    │  /webhooks /marketplace                    │
                    └────────────────┬──────────────────────────┘
                                     │ HTTP (localhost:3000)
                    ┌────────────────▼──────────────────────────┐
                    │         AgentIdP Application               │
                    │         Node.js / Express (port 3000)      │
                    │                                            │
                    │  TLS MW → Helmet → CORS → Morgan           │
                    │  Metrics MW → OrgContext MW                │
                    │  UsageMetering MW → TierEnforcement MW     │
                    │  Auth MW → OPA MW → Routes                 │
                    │        ↓                                   │
                    │  Controllers → Services → Repos            │
                    └──────────┬───────────────┬────────────────┘
                               │               │
              ┌────────────────▼──┐   ┌────────▼────────┐
              │   PostgreSQL 14    │   │    Redis 7       │
              │    Port 5432       │   │   Port 6379      │
              │                    │   │                  │
              │  26 migrations     │   │  Rate limits     │
              │  (001026)         │   │  Token revoke    │
              │  organizations     │   │  Monthly counts  │
              │  agents + DID keys │   │  Tier counters   │
              │  credentials       │   │  Compliance cache│
              │  audit_events      │   │                  │
              │  token_revocations │   └──────────────────┘
              │  oidc_keys         │
              │  federation_partne-│   ┌──────────────────┐
              │  rs                │   │  HashiCorp Vault  │
              │  webhook_subscript-│   │  (optional)       │
              │  ions + deliveries │   │  KV v2 — creds    │
              │  agent_marketplace │   └──────────────────┘
              │  github_oidc_trust │
              │  billing           │   ┌──────────────────┐
              │  delegation_chains │   │  Stripe           │
              │  analytics_events  │   │  (optional)       │
              │  tenant_tiers      │   │  Billing/upgrades │
              └────────────────────┘   └──────────────────┘

Section: Internal layers table — update

Replace:

Layer Responsibility
Routes Wire HTTP methods and paths to controllers
Auth middleware Validate Bearer JWT (RS256 + Redis revocation check)
Rate limit middleware Redis sliding-window counter per client_id
Controllers Parse and validate request, call service, return response
Services Business logic — no direct DB access
Repositories All SQL queries — no business logic
Utils JWT sign/verify, bcrypt, error types, async handler

With:

Layer Responsibility
Routes Wire HTTP methods and paths to controllers
TLS middleware Redirect HTTP → HTTPS when ENFORCE_TLS=true
Auth middleware Validate Bearer JWT (RS256 + Redis revocation check)
OrgContext middleware Resolve organization_id from JWT and attach to req
UsageMetering middleware Fire-and-forget analytics event recording
TierEnforcement middleware Enforce daily API call and token limits via Redis (when TIER_ENFORCEMENT=true)
OPA middleware Scope-based authorization via embedded Wasm or JSON policy
Controllers Parse and validate request, call service, return response
Services Business logic — no direct DB access
Repositories All SQL queries — no business logic
Utils JWT sign/verify, bcrypt, error types, async handler

Section: Service Map — replace entirely

Replace the existing 4-row service map table with the complete Phase 6 service map:

Route prefix Controller Service(s) Repository/ies
/api/v1/agents AgentController AgentService AgentRepository
/api/v1/credentials CredentialController CredentialService CredentialRepository
/api/v1/token TokenController OAuth2Service TokenRepository, CredentialRepository, AgentRepository
/api/v1/audit AuditController AuditService AuditRepository
/api/v1/organizations OrgController OrgService OrgRepository
/api/v1/compliance/* ComplianceController ComplianceService AuditRepository
/api/v1/analytics/* AnalyticsController AnalyticsService direct pool queries
/api/v1/tiers/* TierController TierService pool queries, Stripe SDK
/api/v1/webhooks WebhookController WebhookService WebhookRepository
/api/v1/federation FederationController FederationService direct pool queries
/api/v1/marketplace MarketplaceController MarketplaceService direct pool queries
/api/v1/billing BillingController BillingService direct pool queries
/.well-known/did.json, /api/v1/did/* DIDController DIDService AgentRepository
/.well-known/openid-configuration, /api/v1/oidc/* OIDCController OIDCKeyService, IDTokenService direct pool queries
/api/v1/oidc/trust-policies OIDCTrustPolicyController OIDCTrustPolicyService direct pool queries
/api/v1/delegation DelegationController DelegationService direct pool queries
/api/v1/scaffold ScaffoldController ScaffoldService
/health inline pool, redis
/metrics inline prom-client

Section: Redis — update key patterns table

Replace the existing 3-row Redis key patterns table with:

Key pattern Example Purpose TTL
revoked:<jti> revoked:f1e2d3c4-... Revoked token JTI Remaining token lifetime
rate:<client_id>:<window> rate:a1b2c3...:29086156 Request count per window RATE_LIMIT_WINDOW_MS
monthly:<client_id>:<year>:<month> monthly:a1b2c3...:2026:3 Monthly token issuance count End of month
rate:tier:calls:<tenantId> rate:tier:calls:org-uuid Daily API call counter for tier enforcement Until midnight UTC
rate:tier:tokens:<tenantId> rate:tier:tokens:org-uuid Daily token issuance counter for tier enforcement Until midnight UTC
compliance:report:<tenantId> compliance:report:org-uuid Cached compliance report JSON 5 minutes

Section: New Services — add after the existing component descriptions

Add a new ## New Services (Phases 36) section:

## New Services (Phases 36)

| Service | Source file | Responsibility |
|---------|------------|----------------|
| `AnalyticsService` | `src/services/AnalyticsService.ts` | Fire-and-forget `recordEvent`, time-series `getTokenTrend`, heatmap `getAgentActivity`, per-agent `getAgentUsageSummary` |
| `TierService` | `src/services/TierService.ts` | `getStatus` (reads `tenant_tiers`), `initiateUpgrade` (creates Stripe Checkout Session), `applyUpgrade` (handles Stripe webhook), `enforceAgentLimit` |
| `ComplianceService` | `src/services/ComplianceService.ts` | `generateReport` (Redis-cached 5 min), `exportAgentCards` (AGNTCY format) |
| `DelegationService` | `src/services/DelegationService.ts` | A2A delegation chain creation and verification |
| `DIDService` | `src/services/DIDService.ts` | `did:web` identifier generation and DID document management |
| `OIDCKeyService` | `src/services/OIDCKeyService.ts` | OIDC key rotation, JWKS endpoint |
| `IDTokenService` | `src/services/IDTokenService.ts` | OIDC ID token issuance |
| `FederationService` | `src/services/FederationService.ts` | Cross-tenant agent identity federation |
| `WebhookService` | `src/services/WebhookService.ts` | Event subscriptions, delivery with retry, dead-letter queue |
| `VaultService` | `src/services/VaultService.ts` | HashiCorp Vault KV v2 read/write for credential storage |
| `BillingService` | `src/services/BillingService.ts` | Stripe customer and subscription management |
| `MarketplaceService` | `src/services/MarketplaceService.ts` | Agent listing and discovery |
| `OIDCTrustPolicyService` | `src/services/OIDCTrustPolicyService.ts` | GitHub OIDC trust policy management |
| `EventPublisher` | `src/services/EventPublisher.ts` | Routes domain events to webhook delivery and Kafka (if configured) |

Section: Ports — update table

Replace:

Service Internal port Exposed port (local dev)
AgentIdP app 3000 3000
PostgreSQL 5432 5432
Redis 6379 6379

With:

Service Internal port Exposed port (local dev)
AgentIdP app 3000 3000
Next.js portal 3001 3001
PostgreSQL 5432 5432
Redis 6379 6379

Section: Add new section — API Routes

Add at the end of the file:

## API Routes (Phase 6 complete)

Base path: `/api/v1`

| Route | Method(s) | Auth | Feature flag |
|-------|----------|------|-------------|
| `/api/v1/agents` | GET, POST, PATCH, DELETE | Bearer JWT | always on |
| `/api/v1/credentials` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/token` | POST | none (client credentials) | always on |
| `/api/v1/audit` | GET | Bearer JWT | always on |
| `/api/v1/audit/verify` | GET | Bearer JWT | always on |
| `/api/v1/organizations` | GET, POST | Bearer JWT | always on |
| `/api/v1/compliance/controls` | GET | none | always on |
| `/api/v1/compliance/report` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/compliance/agent-cards` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
| `/api/v1/analytics/token-trend` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/agent-activity` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/analytics/usage-summary` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
| `/api/v1/tiers/status` | GET | Bearer JWT | always on |
| `/api/v1/tiers/upgrade` | POST | Bearer JWT | always on |
| `/api/v1/webhooks` | GET, POST, DELETE | Bearer JWT | always on |
| `/api/v1/federation` | GET, POST | Bearer JWT | always on |
| `/api/v1/delegation` | GET, POST | Bearer JWT | always on |
| `/api/v1/marketplace` | GET | none | always on |
| `/api/v1/billing` | GET, POST | Bearer JWT | always on |
| `/api/v1/did/*` | GET | none | always on |
| `/api/v1/oidc/*` | GET, POST | mixed | always on |
| `/.well-known/openid-configuration` | GET | none | always on |
| `/.well-known/jwks.json` | GET | none | always on |
| `/.well-known/did.json` | GET | none | always on |
| `/health` | GET | none | always on |
| `/metrics` | GET | none | always on |

File: docs/devops/local-development.md

Section: Prerequisites table — update

Replace the existing 3-row prerequisites table with:

Tool Minimum version Purpose
Docker 24+ Container runtime
Docker Compose 2.20+ Multi-container orchestration
Node.js 18.0.0 Run the application, portal, and migrations
npm 9+ Package management and scripts
nvm any Recommended for managing Node.js versions
openssl any RSA key generation

Add after the table:

nvm activation: If using nvm, activate it before running any Node.js commands:

export NVM_DIR="$HOME/.nvm" && source "$NVM_DIR/nvm.sh"

Section: Step 1 — clone and install — update

After npm install (which installs the backend), add:

# Install portal dependencies
cd portal && npm install && cd ..

Section: Step 4 — Start infrastructure services — update the note

Replace:

The app service in docker-compose.yml requires a Dockerfile which has not been written yet. This is a Phase 1 P1 pending item. The commands below will work once the Dockerfile exists.

With:

The full Docker Compose stack (including the app container) is available for field trial deployments — see the field trial guide. For day-to-day development, start only the infrastructure services and run the application directly.

Section: Step 5 — Run database migrations — update expected output

Replace the expected output showing 4 migrations with:

Running database migrations...
  ✓ Applied: 001_create_agents.sql
  ...
  ✓ Applied: 026_add_tenant_tiers.sql

Migrations complete. 26 migration(s) applied.

Section: Add new Step 7 — Start the Next.js portal

Add after Step 6 (Start the application):

## Step 7 — Start the Next.js portal (optional)

The portal is a Next.js 14 application in the `portal/` directory. It communicates with the
AgentIdP backend at `http://localhost:3000`.

Start the portal development server:

```bash
cd portal && npm run dev

The portal starts on port 3001 by default. Open http://localhost:3001.

Available routes:

Route Description
/login OAuth 2.0 login page
/agents Agent registry
/credentials Credential management
/audit Audit log viewer
/analytics Token trend and agent activity charts
/settings/tier Tier status and upgrade
/compliance AGNTCY compliance report
/webhooks Webhook subscription management
/marketplace Agent marketplace

Build the portal for production:

cd portal && npm run build
cd portal && npm start  # serves the production build

Ensure CORS_ORIGIN in your .env includes http://localhost:3001:

CORS_ORIGIN=http://localhost:3001

---

## File: `docs/devops/operations.md`

### Section: Startup checklist — update

Replace the existing 4-step checklist with a checklist that reflects Docker Compose full-stack
operation and includes the portal:

```bash
# 1. Start the full stack
docker compose up --build -d

# 2. Verify all three services are healthy
docker compose ps
# app, postgres, and redis must all show "healthy"

# 3. Run migrations
docker compose exec app npm run db:migrate

# 4. Verify application health
curl http://localhost:3000/health
# Expected: {"status":"ok"}

# 5. (Optional) Start the portal for local dev
cd portal && npm run dev

Section: Redis Key Patterns — update table

Replace the 3-row table with the complete 6-row table (same as the architecture.md update above):

Key pattern Example Purpose TTL
revoked:<jti> revoked:f1e2d3c4-... Revoked token JTI Remaining token lifetime
rate:<client_id>:<window> rate:a1b2c3...:29086156 Request count per window RATE_LIMIT_WINDOW_MS
monthly:<client_id>:<year>:<month> monthly:a1b2c3...:2026:3 Monthly token issuance count End of month
rate:tier:calls:<tenantId> rate:tier:calls:org-uuid Daily API call counter for tier enforcement Until midnight UTC
rate:tier:tokens:<tenantId> rate:tier:tokens:org-uuid Daily token issuance counter for tier enforcement Until midnight UTC
compliance:report:<tenantId> compliance:report:org-uuid Cached compliance report JSON 5 minutes

Add the following new inspection commands at the end of the "Inspect keys" section:

# Check tier API call counter for a tenant
redis-cli GET "rate:tier:calls:<org_id>"

# Check tier token counter for a tenant
redis-cli GET "rate:tier:tokens:<org_id>"

# Check cached compliance report for a tenant
redis-cli GET "compliance:report:<org_id>"
redis-cli TTL "compliance:report:<org_id>"

Section: Monitoring — update Metrics Exposed table

Replace the existing 6-row metrics table with the complete 19-metric table:

Metric Type Labels Description
agentidp_tokens_issued_total Counter scope OAuth 2.0 tokens issued
agentidp_agents_registered_total Counter deployment_env Agents registered
agentidp_http_requests_total Counter method, route, status_code HTTP requests
agentidp_http_request_duration_seconds Histogram method, route, status_code HTTP latency
agentidp_db_query_duration_seconds Histogram operation PostgreSQL query duration
agentidp_redis_command_duration_seconds Histogram command Redis command duration
agentidp_webhook_dead_letters_total Counter event_type Webhook deliveries moved to dead-letter queue
agentidp_credentials_expiring_soon_total Gauge Credentials expiring within 7 days
agentidp_audit_chain_integrity Gauge 1 if audit chain is intact, 0 if broken
agentidp_rate_limit_hits_total Counter client_id Rate limit rejections
agentidp_db_pool_active_connections Gauge Active PostgreSQL connections
agentidp_db_pool_waiting_requests Gauge Requests waiting for a pool connection
agentidp_tenant_api_calls_total Counter org_id, tier API calls per tenant per tier
agentidp_billing_limit_rejections_total Counter org_id, limit_type Tier limit enforcement rejections
agentidp_did_documents_generated_total Counter DID documents generated
agentidp_oidc_tokens_issued_total Counter OIDC ID tokens issued
agentidp_federation_events_total Counter event_type Federation partner events
agentidp_delegation_chains_created_total Counter A2A delegation chains created
agentidp_compliance_reports_generated_total Counter Compliance reports generated

Section: Troubleshooting — add new entries

Append the following troubleshooting entries:


Tier limit rejected — 429 with tier_limit_exceeded code

Symptom: 429 TOO_MANY_REQUESTS with body {"code":"tier_limit_exceeded","message":"..."}

Check the tenant's current tier counter:

# Check API call counter
docker compose exec redis redis-cli GET "rate:tier:calls:<org_id>"

# Check the tenant's tier
psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '<org_id>';"

If the org is on the free tier and has hit 1,000 calls/day, upgrade the tier or wait until midnight UTC for the counter to reset.


Analytics endpoints return 404

Cause: ANALYTICS_ENABLED is set to false in .env.

Fix: Set ANALYTICS_ENABLED=true and restart the application.


Compliance report returns 404

Cause: COMPLIANCE_ENABLED is set to false in .env.

Fix: Set COMPLIANCE_ENABLED=true and restart the application.


Portal CORS error

Symptom: Browser console shows Access-Control-Allow-Origin error on requests to http://localhost:3000.

Fix: Ensure CORS_ORIGIN in .env includes http://localhost:3001:

CORS_ORIGIN=http://localhost:3001

Restart the application after changing this variable.


File: docs/devops/deployment.md

Section: Environment Variable Reference (Section 6) — update Quick Reference table

Add the following rows to the existing quick reference table:

Variable Required Source (AWS) Source (GCP)
BILLING_ENABLED No Task definition env var Cloud Run env var
STRIPE_SECRET_KEY Only if billing enabled Secrets Manager: /<project>/<env>/stripe-secret-key Secret Manager: <name-prefix>-stripe-secret-key
STRIPE_WEBHOOK_SECRET Only if billing enabled Secrets Manager: /<project>/<env>/stripe-webhook-secret Secret Manager: <name-prefix>-stripe-webhook-secret
STRIPE_PRICE_ID Only if billing enabled Task definition env var Cloud Run env var
ANALYTICS_ENABLED No Task definition env var (default: true) Cloud Run env var
TIER_ENFORCEMENT No Task definition env var (default: true) Cloud Run env var
COMPLIANCE_ENABLED No Task definition env var (default: true) Cloud Run env var
REDIS_RATE_LIMIT_ENABLED No Task definition env var Cloud Run env var
RATE_LIMIT_WINDOW_MS No Task definition env var (default: 60000) Cloud Run env var
RATE_LIMIT_MAX_REQUESTS No Task definition env var (default: 100) Cloud Run env var
DB_POOL_MAX No Task definition env var (default: 20) Cloud Run env var
DB_POOL_MIN No Task definition env var (default: 2) Cloud Run env var
DB_POOL_IDLE_TIMEOUT_MS No Task definition env var (default: 30000) Cloud Run env var
DB_POOL_CONNECTION_TIMEOUT_MS No Task definition env var (default: 5000) Cloud Run env var
KAFKA_BROKERS No Task definition env var Cloud Run env var
ENFORCE_TLS No Task definition env var Cloud Run env var
OPA_URL No Task definition env var Cloud Run env var
VAULT_KV_MOUNT No Task definition env var (default: secret) Cloud Run env var

Section: Step 2.8 / Step 3.7 — Run Database Migrations — update migration count

In the migration command output examples in sections 2.8 and 3.7, update migration count references from "4 migration(s)" to "26 migration(s)".


File: docs/devops/security.md

No structural changes required. Append the following note at the end of the "JWT Key Management" section:

OIDC keys are separate from the main JWT keys. OIDC signing keys are stored in the oidc_keys PostgreSQL table (created by migration 014_create_oidc_keys_table.sql), encrypted at rest using pgcrypto (enabled by migration 018_enable_pgcrypto.sql). The OIDCKeyService manages rotation. OIDC keys do not need to be set as environment variables — they are provisioned automatically on first startup.


File: docs/devops/vault-setup.md

Section: Add note on VAULT_KV_MOUNT alias

After the VAULT_MOUNT variable description, add:

Note: The .env.example file uses VAULT_KV_MOUNT as the variable name. The application reads both VAULT_KV_MOUNT and VAULT_MOUNT — prefer VAULT_KV_MOUNT in new configurations for consistency with the current .env.example.


File: docs/devops/README.md

Section: Document index — add field-trial.md entry

Add field-trial.md to the document index table:

Document Audience Contents
... existing entries ...
field-trial.md DevOps engineers, QA In-house Docker Compose field trial execution playbook

Acceptance Criteria

  • environment-variables.md documents all 11 new variables from Phases 36
  • environment-variables.md complete .env example includes all Phase 6 flags
  • database.md schema overview reflects all 26 migrations (001026)
  • database.md documents all 10 new tables added in Phases 36
  • database.md connection pool section references DB_POOL_* env vars
  • architecture.md diagram shows Next.js portal at port 3001
  • architecture.md service map covers all 19 route prefixes
  • architecture.md Redis table covers all 6 key patterns
  • architecture.md new services section documents all 13 Phase 36 services
  • architecture.md API routes section covers all 25 routes
  • local-development.md includes portal setup (Step 7) with all 9 portal routes
  • operations.md startup checklist uses docker compose (not docker-compose)
  • operations.md Redis table covers all 6 key patterns with correct TTLs
  • operations.md metrics table covers all 19 Prometheus metrics
  • operations.md troubleshooting covers tier limits, feature flag 404s, portal CORS
  • deployment.md variable quick reference includes all Phase 36 variables
  • security.md note on OIDC keys added
  • vault-setup.md note on VAULT_KV_MOUNT alias added
  • README.md index includes field-trial.md