# Spec — WS1: DevOps Documentation Update **Change:** phase-7-devops-field-trial **Workstream:** WS1 **Status:** Approved **Written:** 2026-04-04 ## Purpose Specify exactly what must be updated in each `docs/devops/` file to reflect the Phase 6 state of the codebase. The existing docs were written during Phase 2. Phases 3–6 added 14 DB migrations, new services, feature flags, Stripe billing, Prometheus metrics, Redis key patterns, and the Next.js portal — none of which appear in the current docs. The Developer implementing this spec must update each listed file precisely as described. No other content in those files should be changed unless explicitly stated. --- ## File: `docs/devops/environment-variables.md` ### Section: Required Variables — update `DATABASE_URL` description Replace the sentence: > The application uses `pg.Pool` with this connection string. Connection pool size uses the `pg` > default (10 connections). With: > The application uses `pg.Pool` with this connection string. Pool sizing is controlled by the > optional `DB_POOL_*` variables documented below. ### Section: Required Variables — add `STRIPE_SECRET_KEY` after `DATABASE_URL` Insert a new required variable block: > **Note on Billing:** `STRIPE_SECRET_KEY`, `STRIPE_WEBHOOK_SECRET`, and `STRIPE_PRICE_ID` are > required when `BILLING_ENABLED=true`. For local development, set `BILLING_ENABLED=false` and > use placeholder values. ### Section: Optional Variables — add all Phase 3–6 variables Add each of the following variable blocks in this order, after the existing `VAULT_MOUNT` block and before `POLICY_DIR`: --- #### `BILLING_ENABLED` | | | |-|-| | **Required** | No | | **Default** | `false` | | **Values** | `true`, `false` | | **Example** | `BILLING_ENABLED=false` | Gates Stripe billing integration and free-tier agent limit enforcement. When `false`, no Stripe API calls are made and all tier limits are unenforced. Set to `false` for in-house testing. --- #### `STRIPE_SECRET_KEY` | | | |-|-| | **Required** | Only when `BILLING_ENABLED=true` | | **Format** | Stripe secret key string (`sk_live_*` or `sk_test_*`) | | **Example** | `STRIPE_SECRET_KEY=sk_test_placeholder` | Stripe API key used to create Checkout Sessions for tier upgrades. Never use a live key in development. --- #### `STRIPE_WEBHOOK_SECRET` | | | |-|-| | **Required** | Only when `BILLING_ENABLED=true` | | **Format** | Stripe webhook signing secret (`whsec_*`) | | **Example** | `STRIPE_WEBHOOK_SECRET=whsec_placeholder` | Used to verify the HMAC signature on incoming Stripe webhook events. Without this, the billing webhook endpoint will reject all events. --- #### `STRIPE_PRICE_ID` | | | |-|-| | **Required** | Only when `BILLING_ENABLED=true` | | **Format** | Stripe Price ID string (`price_*`) | | **Example** | `STRIPE_PRICE_ID=price_placeholder` | The Stripe Price object used when creating a Checkout Session for the Pro tier upgrade. --- #### `ANALYTICS_ENABLED` | | | |-|-| | **Required** | No | | **Default** | `true` | | **Values** | `true`, `false` | | **Example** | `ANALYTICS_ENABLED=true` | Feature flag that gates the `/api/v1/analytics/*` routes. When `false`, the analytics router is not mounted and all analytics endpoints return 404. Events are still recorded internally regardless of this flag. --- #### `TIER_ENFORCEMENT` | | | |-|-| | **Required** | No | | **Default** | `true` | | **Values** | `true`, `false` | | **Example** | `TIER_ENFORCEMENT=true` | Enables Redis-backed tier limit enforcement per tenant. When `true`, the `tierEnforcement` middleware checks daily API call and token counts against per-tier limits defined in `src/config/tiers.ts`. Enterprise tenants with `maxCallsPerDay: Infinity` bypass enforcement. When `false`, no tier limits are enforced. --- #### `COMPLIANCE_ENABLED` | | | |-|-| | **Required** | No | | **Default** | `true` | | **Values** | `true`, `false` | | **Example** | `COMPLIANCE_ENABLED=true` | Feature flag that gates the report and agent-card export endpoints under `/api/v1/compliance/*`. When `false`, those endpoints return 404. The SOC2 controls endpoint (`/api/v1/compliance/controls`) and audit chain verification (`/api/v1/audit/verify`) are always enabled regardless of this flag. --- #### `REDIS_RATE_LIMIT_ENABLED` | | | |-|-| | **Required** | No | | **Default** | `false` | | **Values** | `true`, `false` | | **Example** | `REDIS_RATE_LIMIT_ENABLED=true` | When `true`, rate limiting uses a Redis-backed sliding-window counter per `client_id`. When `false`, rate limiting uses an in-process `RateLimiterMemory` store (does not share state across multiple app instances). --- #### `RATE_LIMIT_WINDOW_MS` | | | |-|-| | **Required** | No | | **Default** | `60000` | | **Format** | Integer (milliseconds) | | **Example** | `RATE_LIMIT_WINDOW_MS=60000` | Duration of the sliding-window rate limit period in milliseconds. Only effective when `REDIS_RATE_LIMIT_ENABLED=true`. --- #### `RATE_LIMIT_MAX_REQUESTS` | | | |-|-| | **Required** | No | | **Default** | `100` | | **Format** | Integer | | **Example** | `RATE_LIMIT_MAX_REQUESTS=100` | Maximum number of requests allowed per `client_id` within `RATE_LIMIT_WINDOW_MS`. Requests exceeding this limit receive `429 RATE_LIMIT_EXCEEDED`. --- #### `DB_POOL_MAX` | | | |-|-| | **Required** | No | | **Default** | `20` | | **Format** | Integer | | **Example** | `DB_POOL_MAX=20` | Maximum number of PostgreSQL connections in the pool. Increase for high-throughput production deployments. Ensure your PostgreSQL instance's `max_connections` is set to at least `DB_POOL_MAX × number_of_app_instances + 5`. --- #### `DB_POOL_MIN` | | | |-|-| | **Required** | No | | **Default** | `2` | | **Format** | Integer | | **Example** | `DB_POOL_MIN=2` | Minimum number of idle connections kept alive in the pool. --- #### `DB_POOL_IDLE_TIMEOUT_MS` | | | |-|-| | **Required** | No | | **Default** | `30000` | | **Format** | Integer (milliseconds) | | **Example** | `DB_POOL_IDLE_TIMEOUT_MS=30000` | Milliseconds a connection can sit idle before being evicted from the pool. --- #### `DB_POOL_CONNECTION_TIMEOUT_MS` | | | |-|-| | **Required** | No | | **Default** | `5000` | | **Format** | Integer (milliseconds) | | **Example** | `DB_POOL_CONNECTION_TIMEOUT_MS=5000` | Milliseconds the pool waits for a connection to become available before throwing a connection timeout error. --- #### `VAULT_KV_MOUNT` | | | |-|-| | **Required** | No | | **Default** | `secret` | | **Format** | String (no leading or trailing slash) | | **Example** | `VAULT_KV_MOUNT=agentidp` | KV v2 secrets engine mount path used by `VaultService`. Equivalent to the existing `VAULT_MOUNT` variable — note that `.env.example` uses `VAULT_KV_MOUNT`; the underlying service reads either. --- #### `OPA_URL` | | | |-|-| | **Required** | No | | **Format** | URL string | | **Example** | `OPA_URL=http://localhost:8181` | URL of a running OPA server for external policy evaluation. When unset, the application falls back to the embedded Wasm or JSON policy in `POLICY_DIR`. Used for health check reporting. --- #### `KAFKA_BROKERS` | | | |-|-| | **Required** | No | | **Format** | Comma-separated broker addresses | | **Example** | `KAFKA_BROKERS=localhost:9092` | When set, the `KafkaAdapter` publishes domain events to Kafka. When unset, Kafka publishing is disabled and events are only delivered via the `WebhookService`. --- #### `ENFORCE_TLS` | | | |-|-| | **Required** | No | | **Default** | `false` | | **Values** | `true`, `false` | | **Example** | `ENFORCE_TLS=true` | When `true`, the `tlsEnforcementMiddleware` redirects all HTTP requests to HTTPS. Enable in production deployments where TLS termination is handled at the application layer. --- ### Section: Complete `.env` Example — replace entirely Replace the entire existing `.env` Example section with the following complete example that reflects all Phase 1–6 variables: ``` # ── Server ────────────────────────────────────────────────────────────────── NODE_ENV=development PORT=3000 CORS_ORIGIN=http://localhost:3001 # ── Database ───────────────────────────────────────────────────────────────── DATABASE_URL=postgresql://sentryagent:sentryagent@localhost:5432/sentryagent_idp DB_POOL_MAX=20 DB_POOL_MIN=2 DB_POOL_IDLE_TIMEOUT_MS=30000 DB_POOL_CONNECTION_TIMEOUT_MS=5000 # ── Redis ──────────────────────────────────────────────────────────────────── REDIS_URL=redis://localhost:6379 REDIS_RATE_LIMIT_ENABLED=true RATE_LIMIT_WINDOW_MS=60000 RATE_LIMIT_MAX_REQUESTS=100 # ── JWT Keys (generate with openssl — see docs/devops/security.md) ────────── JWT_PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\nMIIEow...\n-----END RSA PRIVATE KEY-----" JWT_PUBLIC_KEY="-----BEGIN PUBLIC KEY-----\nMIIBIj...\n-----END PUBLIC KEY-----" # ── Billing (Stripe) — set BILLING_ENABLED=false for local/in-house testing ─ BILLING_ENABLED=false STRIPE_SECRET_KEY=sk_test_placeholder STRIPE_WEBHOOK_SECRET=whsec_placeholder STRIPE_PRICE_ID=price_placeholder # ── Phase 6 Feature Flags ───────────────────────────────────────────────────── ANALYTICS_ENABLED=true TIER_ENFORCEMENT=true COMPLIANCE_ENABLED=true # ── HashiCorp Vault (optional) ──────────────────────────────────────────────── # VAULT_ADDR=http://127.0.0.1:8200 # VAULT_TOKEN=hvs.XXXXXXXXXXXXXXXXXXXXXX # VAULT_KV_MOUNT=secret # ── OPA (optional) ─────────────────────────────────────────────────────────── # POLICY_DIR=/etc/sentryagent/policies # OPA_URL=http://localhost:8181 # ── Kafka (optional) ───────────────────────────────────────────────────────── # KAFKA_BROKERS=localhost:9092 # ── TLS ────────────────────────────────────────────────────────────────────── # ENFORCE_TLS=true ``` ### Section: Variable Validation at Startup — add note on feature flags Append after the existing validation list: > **Feature flags** (`BILLING_ENABLED`, `ANALYTICS_ENABLED`, `TIER_ENFORCEMENT`, > `COMPLIANCE_ENABLED`) are read at startup. `ANALYTICS_ENABLED` and `COMPLIANCE_ENABLED` > determine whether their respective routers are mounted — changing these values requires a > process restart. --- ## File: `docs/devops/database.md` ### Section: Schema Overview — replace diagram Replace: ``` agents └── credentials (FK: client_id → agents.agent_id, CASCADE DELETE) audit_events (no FK — append-only, agent_id is informational) token_revocations (no FK — independent revocation store) ``` With: ``` organizations ├── agents (FK: organization_id → organizations.org_id) │ ├── credentials (FK: client_id → agents.agent_id, CASCADE DELETE) │ └── agent_did_keys (FK: agent_id → agents.agent_id) └── audit_events (FK: organization_id — informational, no cascade) token_revocations (no FK — independent revocation store) oidc_keys (standalone — OIDC signing key rotation) federation_partners (standalone — cross-tenant identity) webhook_subscriptions → webhook_deliveries (FK: subscription_id) agent_marketplace (standalone — agent discovery catalog) github_oidc_trust_policies (standalone — CI/CD trust) billing (FK: org_id → organizations.org_id — one row per org) delegation_chains (standalone — A2A delegation records) analytics_events (FK: organization_id — append-only) tenant_tiers (FK: org_id → organizations.org_id — one row per org) ``` ### Section: Tables — add new table entries After the existing `token_revocations` table section, add the following new table definitions: --- #### `organizations` Created by migration `006_create_organizations_table.sql`. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `org_id` | `UUID` | No | Primary key | | `name` | `VARCHAR(255)` | No | Organisation display name | | `slug` | `VARCHAR(64)` | No | URL-safe unique identifier | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `agent_did_keys` Created by migration `012_create_agent_did_keys_table.sql`. Stores the DID document key material for each agent. One agent may have multiple keys for rotation purposes. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `agent_id` | `UUID` | No | FK → `agents.agent_id` | | `key_id` | `VARCHAR(255)` | No | DID key fragment identifier | | `public_key_jwk` | `JSONB` | No | Public key in JWK format | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### DID columns on `agents` Added by migration `013_add_did_columns_to_agents.sql`: - `did` — `VARCHAR(512)` nullable — the `did:web` identifier for this agent - `did_document` — `JSONB` nullable — full DID document --- #### `oidc_keys` Created by migration `014_create_oidc_keys_table.sql`. Stores RSA key pairs used for OIDC ID token signing. Supports key rotation — active key is determined by the most recently created row. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `kid` | `VARCHAR(128)` | No | Key ID — referenced in JWKS | | `private_key_pem` | `TEXT` | No | Encrypted RSA private key (pgcrypto) | | `public_key_pem` | `TEXT` | No | RSA public key | | `algorithm` | `VARCHAR(16)` | No | Always `RS256` | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `federation_partners` Created by migration `015_create_federation_partners_table.sql`. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `org_id` | `UUID` | No | Owning organisation | | `partner_name` | `VARCHAR(255)` | No | Display name | | `partner_jwks_url` | `TEXT` | No | URL to partner's JWKS endpoint | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `webhook_subscriptions` Created by migration `016_create_webhook_subscriptions_table.sql`. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `org_id` | `UUID` | No | Owning organisation | | `event_type` | `VARCHAR(128)` | No | Event type filter (e.g. `agent.created`) | | `target_url` | `TEXT` | No | HTTPS delivery endpoint | | `secret` | `VARCHAR(255)` | Yes | HMAC signing secret for delivery verification | | `active` | `BOOLEAN` | No | Default: `true` | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `webhook_deliveries` Created by migration `017_create_webhook_deliveries_table.sql`. Records each delivery attempt for a webhook event, including the dead-letter queue entries. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `subscription_id` | `UUID` | No | FK → `webhook_subscriptions.id` | | `event_type` | `VARCHAR(128)` | No | Event type delivered | | `payload` | `JSONB` | No | Full event payload | | `status` | `VARCHAR(32)` | No | `pending`, `delivered`, `failed`, `dead_letter` | | `response_status` | `INTEGER` | Yes | HTTP status from delivery endpoint | | `attempt_count` | `INTEGER` | No | Default: `0` | | `last_attempted_at` | `TIMESTAMPTZ` | Yes | | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | **Dead-letter queue:** After 3 failed delivery attempts, the row status is set to `dead_letter` and the `agentidp_webhook_dead_letters_total` Prometheus counter is incremented. The Prometheus metric label is `event_type`. --- #### pgcrypto extension Enabled by migration `018_enable_pgcrypto.sql`. Used for encrypting sensitive columns in `oidc_keys` and credential data. --- #### `agent_marketplace` Created by migration `021_add_agent_marketplace.sql`. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `agent_id` | `UUID` | No | FK → `agents.agent_id` | | `listing_name` | `VARCHAR(255)` | No | Display name in marketplace | | `description` | `TEXT` | Yes | Markdown description | | `tags` | `TEXT[]` | No | Searchable tags. Default: `{}` | | `published` | `BOOLEAN` | No | Default: `false` | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `github_oidc_trust_policies` Created by migration `022_add_github_oidc_trust_policies.sql`. Maps GitHub Actions OIDC claims to agent identities for CI/CD token exchange. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `org_id` | `UUID` | No | Owning organisation | | `repository` | `VARCHAR(512)` | No | GitHub repository slug (`owner/repo`) | | `branch` | `VARCHAR(255)` | Yes | Branch filter (null = any branch) | | `agent_id` | `UUID` | No | Agent to issue a token for on match | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `billing` Created by migration `023_add_billing.sql`. One row per organisation. Tracks the org's Stripe customer and subscription state. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `org_id` | `UUID` | No | FK → `organizations.org_id` (UNIQUE) | | `stripe_customer_id` | `VARCHAR(255)` | Yes | Stripe Customer ID | | `stripe_subscription_id` | `VARCHAR(255)` | Yes | Stripe Subscription ID | | `status` | `VARCHAR(64)` | No | Stripe subscription status or `none` | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `delegation_chains` Created by migration `024_add_delegation_chains.sql`. Records A2A delegation grants created via the delegation API. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `delegator_agent_id` | `UUID` | No | Agent granting the delegation | | `delegate_agent_id` | `UUID` | No | Agent receiving the delegation | | `scopes` | `TEXT[]` | No | Scopes being delegated | | `expires_at` | `TIMESTAMPTZ` | Yes | Optional expiry | | `created_at` | `TIMESTAMPTZ` | No | Default: `NOW()` | --- #### `analytics_events` Created by migration `025_add_analytics_events.sql`. Append-only event store for analytics. Supports token trend, agent activity, and usage summary queries. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `organization_id` | `UUID` | No | Owning organisation | | `date` | `DATE` | No | Calendar date of the event (UTC) | | `metric_type` | `VARCHAR(64)` | No | e.g. `token_issued`, `agent_called` | | `count` | `INTEGER` | No | Event count for this date+type | **Index:** `(organization_id, date DESC)` for fast time-series queries. --- #### `tenant_tiers` Created by migration `026_add_tenant_tiers.sql`. One row per organisation. Stores the current tier and enforces tier limits via the `tierEnforcement` middleware. | Column | Type | Nullable | Description | |--------|------|----------|-------------| | `id` | `UUID` | No | Primary key | | `org_id` | `UUID` | No | FK → `organizations.org_id` (UNIQUE) | | `tier` | `ENUM('free','pro','enterprise')` | No | Current tier. Default: `free` | | `updated_at` | `TIMESTAMPTZ` | No | Last tier change. Default: `NOW()` | **Tier limits** (from `src/config/tiers.ts`): | Tier | Max Agents | Max API Calls/Day | Max Tokens/Day | |------|-----------|-------------------|----------------| | free | 10 | 1,000 | 1,000 | | pro | 100 | 50,000 | 50,000 | | enterprise | unlimited | unlimited | unlimited | --- ### Section: Migration Runner — update "Running migrations" example output Replace the example output block (showing 4 migrations) with: ``` Running database migrations... ✓ Applied: 001_create_agents.sql ✓ Applied: 002_create_credentials.sql ... ✓ Applied: 025_add_analytics_events.sql ✓ Applied: 026_add_tenant_tiers.sql Migrations complete. 26 migration(s) applied. ``` Replace the "Verifying applied migrations" expected output to show 26 rows rather than 4. --- ### Section: Connection Pool — replace entirely Replace: > The application uses `pg.Pool` with default settings (max 10 connections). The pool is a > singleton — one pool per process instance. > > To override pool size, modify `src/db/pool.ts`. In production, ensure `DATABASE_URL` includes > connection pool parameters if using PgBouncer or a managed connection pooler. With: > The application uses `pg.Pool` with settings read from environment variables. The pool is a > singleton — one pool per process instance. > > | Variable | Default | Description | > |----------|---------|-------------| > | `DB_POOL_MAX` | `20` | Maximum connections | > | `DB_POOL_MIN` | `2` | Minimum idle connections | > | `DB_POOL_IDLE_TIMEOUT_MS` | `30000` | Idle eviction timeout (ms) | > | `DB_POOL_CONNECTION_TIMEOUT_MS` | `5000` | Acquisition timeout (ms) | > > Pool size is exposed as Prometheus metrics: `agentidp_db_pool_active_connections` and > `agentidp_db_pool_waiting_requests`. Monitor these in production to detect pool exhaustion. --- ## File: `docs/devops/architecture.md` ### Section: Component Overview — replace the ASCII diagram Replace the existing ASCII diagram with: ``` ┌───────────────────────────────────────────┐ │ Next.js Portal (port 3001) │ │ portal/ — Next.js 14 │ │ /login /agents /credentials /audit │ │ /analytics /settings/tier /compliance │ │ /webhooks /marketplace │ └────────────────┬──────────────────────────┘ │ HTTP (localhost:3000) ┌────────────────▼──────────────────────────┐ │ AgentIdP Application │ │ Node.js / Express (port 3000) │ │ │ │ TLS MW → Helmet → CORS → Morgan │ │ Metrics MW → OrgContext MW │ │ UsageMetering MW → TierEnforcement MW │ │ Auth MW → OPA MW → Routes │ │ ↓ │ │ Controllers → Services → Repos │ └──────────┬───────────────┬────────────────┘ │ │ ┌────────────────▼──┐ ┌────────▼────────┐ │ PostgreSQL 14 │ │ Redis 7 │ │ Port 5432 │ │ Port 6379 │ │ │ │ │ │ 26 migrations │ │ Rate limits │ │ (001–026) │ │ Token revoke │ │ organizations │ │ Monthly counts │ │ agents + DID keys │ │ Tier counters │ │ credentials │ │ Compliance cache│ │ audit_events │ │ │ │ token_revocations │ └──────────────────┘ │ oidc_keys │ │ federation_partne-│ ┌──────────────────┐ │ rs │ │ HashiCorp Vault │ │ webhook_subscript-│ │ (optional) │ │ ions + deliveries │ │ KV v2 — creds │ │ agent_marketplace │ └──────────────────┘ │ github_oidc_trust │ │ billing │ ┌──────────────────┐ │ delegation_chains │ │ Stripe │ │ analytics_events │ │ (optional) │ │ tenant_tiers │ │ Billing/upgrades │ └────────────────────┘ └──────────────────┘ ``` ### Section: Internal layers table — update Replace: | Layer | Responsibility | |-------|---------------| | Routes | Wire HTTP methods and paths to controllers | | Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) | | Rate limit middleware | Redis sliding-window counter per `client_id` | | Controllers | Parse and validate request, call service, return response | | Services | Business logic — no direct DB access | | Repositories | All SQL queries — no business logic | | Utils | JWT sign/verify, bcrypt, error types, async handler | With: | Layer | Responsibility | |-------|---------------| | Routes | Wire HTTP methods and paths to controllers | | TLS middleware | Redirect HTTP → HTTPS when `ENFORCE_TLS=true` | | Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) | | OrgContext middleware | Resolve `organization_id` from JWT and attach to `req` | | UsageMetering middleware | Fire-and-forget analytics event recording | | TierEnforcement middleware | Enforce daily API call and token limits via Redis (when `TIER_ENFORCEMENT=true`) | | OPA middleware | Scope-based authorization via embedded Wasm or JSON policy | | Controllers | Parse and validate request, call service, return response | | Services | Business logic — no direct DB access | | Repositories | All SQL queries — no business logic | | Utils | JWT sign/verify, bcrypt, error types, async handler | ### Section: Service Map — replace entirely Replace the existing 4-row service map table with the complete Phase 6 service map: | Route prefix | Controller | Service(s) | Repository/ies | |-------------|-----------|-----------|----------------| | `/api/v1/agents` | `AgentController` | `AgentService` | `AgentRepository` | | `/api/v1/credentials` | `CredentialController` | `CredentialService` | `CredentialRepository` | | `/api/v1/token` | `TokenController` | `OAuth2Service` | `TokenRepository`, `CredentialRepository`, `AgentRepository` | | `/api/v1/audit` | `AuditController` | `AuditService` | `AuditRepository` | | `/api/v1/organizations` | `OrgController` | `OrgService` | `OrgRepository` | | `/api/v1/compliance/*` | `ComplianceController` | `ComplianceService` | `AuditRepository` | | `/api/v1/analytics/*` | `AnalyticsController` | `AnalyticsService` | direct pool queries | | `/api/v1/tiers/*` | `TierController` | `TierService` | pool queries, Stripe SDK | | `/api/v1/webhooks` | `WebhookController` | `WebhookService` | `WebhookRepository` | | `/api/v1/federation` | `FederationController` | `FederationService` | direct pool queries | | `/api/v1/marketplace` | `MarketplaceController` | `MarketplaceService` | direct pool queries | | `/api/v1/billing` | `BillingController` | `BillingService` | direct pool queries | | `/.well-known/did.json`, `/api/v1/did/*` | `DIDController` | `DIDService` | `AgentRepository` | | `/.well-known/openid-configuration`, `/api/v1/oidc/*` | `OIDCController` | `OIDCKeyService`, `IDTokenService` | direct pool queries | | `/api/v1/oidc/trust-policies` | `OIDCTrustPolicyController` | `OIDCTrustPolicyService` | direct pool queries | | `/api/v1/delegation` | `DelegationController` | `DelegationService` | direct pool queries | | `/api/v1/scaffold` | `ScaffoldController` | `ScaffoldService` | — | | `/health` | inline | — | pool, redis | | `/metrics` | inline | — | prom-client | ### Section: Redis — update key patterns table Replace the existing 3-row Redis key patterns table with: | Key pattern | Example | Purpose | TTL | |------------|---------|---------|-----| | `revoked:` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime | | `rate::` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` | | `monthly:::` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month | | `rate:tier:calls:` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC | | `rate:tier:tokens:` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC | | `compliance:report:` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes | ### Section: New Services — add after the existing component descriptions Add a new `## New Services (Phases 3–6)` section: ``` ## New Services (Phases 3–6) | Service | Source file | Responsibility | |---------|------------|----------------| | `AnalyticsService` | `src/services/AnalyticsService.ts` | Fire-and-forget `recordEvent`, time-series `getTokenTrend`, heatmap `getAgentActivity`, per-agent `getAgentUsageSummary` | | `TierService` | `src/services/TierService.ts` | `getStatus` (reads `tenant_tiers`), `initiateUpgrade` (creates Stripe Checkout Session), `applyUpgrade` (handles Stripe webhook), `enforceAgentLimit` | | `ComplianceService` | `src/services/ComplianceService.ts` | `generateReport` (Redis-cached 5 min), `exportAgentCards` (AGNTCY format) | | `DelegationService` | `src/services/DelegationService.ts` | A2A delegation chain creation and verification | | `DIDService` | `src/services/DIDService.ts` | `did:web` identifier generation and DID document management | | `OIDCKeyService` | `src/services/OIDCKeyService.ts` | OIDC key rotation, JWKS endpoint | | `IDTokenService` | `src/services/IDTokenService.ts` | OIDC ID token issuance | | `FederationService` | `src/services/FederationService.ts` | Cross-tenant agent identity federation | | `WebhookService` | `src/services/WebhookService.ts` | Event subscriptions, delivery with retry, dead-letter queue | | `VaultService` | `src/services/VaultService.ts` | HashiCorp Vault KV v2 read/write for credential storage | | `BillingService` | `src/services/BillingService.ts` | Stripe customer and subscription management | | `MarketplaceService` | `src/services/MarketplaceService.ts` | Agent listing and discovery | | `OIDCTrustPolicyService` | `src/services/OIDCTrustPolicyService.ts` | GitHub OIDC trust policy management | | `EventPublisher` | `src/services/EventPublisher.ts` | Routes domain events to webhook delivery and Kafka (if configured) | ``` ### Section: Ports — update table Replace: | Service | Internal port | Exposed port (local dev) | |---------|--------------|--------------------------| | AgentIdP app | 3000 | 3000 | | PostgreSQL | 5432 | 5432 | | Redis | 6379 | 6379 | With: | Service | Internal port | Exposed port (local dev) | |---------|--------------|--------------------------| | AgentIdP app | 3000 | 3000 | | Next.js portal | 3001 | 3001 | | PostgreSQL | 5432 | 5432 | | Redis | 6379 | 6379 | ### Section: Add new section — API Routes Add at the end of the file: ``` ## API Routes (Phase 6 complete) Base path: `/api/v1` | Route | Method(s) | Auth | Feature flag | |-------|----------|------|-------------| | `/api/v1/agents` | GET, POST, PATCH, DELETE | Bearer JWT | always on | | `/api/v1/credentials` | GET, POST, DELETE | Bearer JWT | always on | | `/api/v1/token` | POST | none (client credentials) | always on | | `/api/v1/audit` | GET | Bearer JWT | always on | | `/api/v1/audit/verify` | GET | Bearer JWT | always on | | `/api/v1/organizations` | GET, POST | Bearer JWT | always on | | `/api/v1/compliance/controls` | GET | none | always on | | `/api/v1/compliance/report` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` | | `/api/v1/compliance/agent-cards` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` | | `/api/v1/analytics/token-trend` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` | | `/api/v1/analytics/agent-activity` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` | | `/api/v1/analytics/usage-summary` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` | | `/api/v1/tiers/status` | GET | Bearer JWT | always on | | `/api/v1/tiers/upgrade` | POST | Bearer JWT | always on | | `/api/v1/webhooks` | GET, POST, DELETE | Bearer JWT | always on | | `/api/v1/federation` | GET, POST | Bearer JWT | always on | | `/api/v1/delegation` | GET, POST | Bearer JWT | always on | | `/api/v1/marketplace` | GET | none | always on | | `/api/v1/billing` | GET, POST | Bearer JWT | always on | | `/api/v1/did/*` | GET | none | always on | | `/api/v1/oidc/*` | GET, POST | mixed | always on | | `/.well-known/openid-configuration` | GET | none | always on | | `/.well-known/jwks.json` | GET | none | always on | | `/.well-known/did.json` | GET | none | always on | | `/health` | GET | none | always on | | `/metrics` | GET | none | always on | ``` --- ## File: `docs/devops/local-development.md` ### Section: Prerequisites table — update Replace the existing 3-row prerequisites table with: | Tool | Minimum version | Purpose | |------|----------------|---------| | Docker | 24+ | Container runtime | | Docker Compose | 2.20+ | Multi-container orchestration | | Node.js | 18.0.0 | Run the application, portal, and migrations | | npm | 9+ | Package management and scripts | | nvm | any | Recommended for managing Node.js versions | | openssl | any | RSA key generation | Add after the table: > **nvm activation:** If using nvm, activate it before running any Node.js commands: > ```bash > export NVM_DIR="$HOME/.nvm" && source "$NVM_DIR/nvm.sh" > ``` ### Section: Step 1 — clone and install — update After `npm install` (which installs the backend), add: ```bash # Install portal dependencies cd portal && npm install && cd .. ``` ### Section: Step 4 — Start infrastructure services — update the note Replace: > The `app` service in `docker-compose.yml` requires a `Dockerfile` which has not been written > yet. This is a **Phase 1 P1 pending item**. The commands below will work once the Dockerfile > exists. With: > The full Docker Compose stack (including the `app` container) is available for field trial > deployments — see the [field trial guide](field-trial.md). For day-to-day development, start > only the infrastructure services and run the application directly. ### Section: Step 5 — Run database migrations — update expected output Replace the expected output showing 4 migrations with: ``` Running database migrations... ✓ Applied: 001_create_agents.sql ... ✓ Applied: 026_add_tenant_tiers.sql Migrations complete. 26 migration(s) applied. ``` ### Section: Add new Step 7 — Start the Next.js portal Add after Step 6 (Start the application): ``` ## Step 7 — Start the Next.js portal (optional) The portal is a Next.js 14 application in the `portal/` directory. It communicates with the AgentIdP backend at `http://localhost:3000`. Start the portal development server: ```bash cd portal && npm run dev ``` The portal starts on port 3001 by default. Open http://localhost:3001. Available routes: | Route | Description | |-------|-------------| | `/login` | OAuth 2.0 login page | | `/agents` | Agent registry | | `/credentials` | Credential management | | `/audit` | Audit log viewer | | `/analytics` | Token trend and agent activity charts | | `/settings/tier` | Tier status and upgrade | | `/compliance` | AGNTCY compliance report | | `/webhooks` | Webhook subscription management | | `/marketplace` | Agent marketplace | Build the portal for production: ```bash cd portal && npm run build cd portal && npm start # serves the production build ``` Ensure `CORS_ORIGIN` in your `.env` includes `http://localhost:3001`: ``` CORS_ORIGIN=http://localhost:3001 ``` ``` --- ## File: `docs/devops/operations.md` ### Section: Startup checklist — update Replace the existing 4-step checklist with a checklist that reflects Docker Compose full-stack operation and includes the portal: ```bash # 1. Start the full stack docker compose up --build -d # 2. Verify all three services are healthy docker compose ps # app, postgres, and redis must all show "healthy" # 3. Run migrations docker compose exec app npm run db:migrate # 4. Verify application health curl http://localhost:3000/health # Expected: {"status":"ok"} # 5. (Optional) Start the portal for local dev cd portal && npm run dev ``` ### Section: Redis Key Patterns — update table Replace the 3-row table with the complete 6-row table (same as the architecture.md update above): | Key pattern | Example | Purpose | TTL | |------------|---------|---------|-----| | `revoked:` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime | | `rate::` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` | | `monthly:::` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month | | `rate:tier:calls:` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC | | `rate:tier:tokens:` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC | | `compliance:report:` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes | Add the following new inspection commands at the end of the "Inspect keys" section: ```bash # Check tier API call counter for a tenant redis-cli GET "rate:tier:calls:" # Check tier token counter for a tenant redis-cli GET "rate:tier:tokens:" # Check cached compliance report for a tenant redis-cli GET "compliance:report:" redis-cli TTL "compliance:report:" ``` ### Section: Monitoring — update Metrics Exposed table Replace the existing 6-row metrics table with the complete 19-metric table: | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `agentidp_tokens_issued_total` | Counter | `scope` | OAuth 2.0 tokens issued | | `agentidp_agents_registered_total` | Counter | `deployment_env` | Agents registered | | `agentidp_http_requests_total` | Counter | `method`, `route`, `status_code` | HTTP requests | | `agentidp_http_request_duration_seconds` | Histogram | `method`, `route`, `status_code` | HTTP latency | | `agentidp_db_query_duration_seconds` | Histogram | `operation` | PostgreSQL query duration | | `agentidp_redis_command_duration_seconds` | Histogram | `command` | Redis command duration | | `agentidp_webhook_dead_letters_total` | Counter | `event_type` | Webhook deliveries moved to dead-letter queue | | `agentidp_credentials_expiring_soon_total` | Gauge | — | Credentials expiring within 7 days | | `agentidp_audit_chain_integrity` | Gauge | — | `1` if audit chain is intact, `0` if broken | | `agentidp_rate_limit_hits_total` | Counter | `client_id` | Rate limit rejections | | `agentidp_db_pool_active_connections` | Gauge | — | Active PostgreSQL connections | | `agentidp_db_pool_waiting_requests` | Gauge | — | Requests waiting for a pool connection | | `agentidp_tenant_api_calls_total` | Counter | `org_id`, `tier` | API calls per tenant per tier | | `agentidp_billing_limit_rejections_total` | Counter | `org_id`, `limit_type` | Tier limit enforcement rejections | | `agentidp_did_documents_generated_total` | Counter | — | DID documents generated | | `agentidp_oidc_tokens_issued_total` | Counter | — | OIDC ID tokens issued | | `agentidp_federation_events_total` | Counter | `event_type` | Federation partner events | | `agentidp_delegation_chains_created_total` | Counter | — | A2A delegation chains created | | `agentidp_compliance_reports_generated_total` | Counter | — | Compliance reports generated | ### Section: Troubleshooting — add new entries Append the following troubleshooting entries: --- **Tier limit rejected — 429 with `tier_limit_exceeded` code** Symptom: `429 TOO_MANY_REQUESTS` with body `{"code":"tier_limit_exceeded","message":"..."}` Check the tenant's current tier counter: ```bash # Check API call counter docker compose exec redis redis-cli GET "rate:tier:calls:" # Check the tenant's tier psql "$DATABASE_URL" -c "SELECT org_id, tier FROM tenant_tiers WHERE org_id = '';" ``` If the org is on the `free` tier and has hit 1,000 calls/day, upgrade the tier or wait until midnight UTC for the counter to reset. --- **Analytics endpoints return 404** Cause: `ANALYTICS_ENABLED` is set to `false` in `.env`. Fix: Set `ANALYTICS_ENABLED=true` and restart the application. --- **Compliance report returns 404** Cause: `COMPLIANCE_ENABLED` is set to `false` in `.env`. Fix: Set `COMPLIANCE_ENABLED=true` and restart the application. --- **Portal CORS error** Symptom: Browser console shows `Access-Control-Allow-Origin` error on requests to `http://localhost:3000`. Fix: Ensure `CORS_ORIGIN` in `.env` includes `http://localhost:3001`: ``` CORS_ORIGIN=http://localhost:3001 ``` Restart the application after changing this variable. --- ## File: `docs/devops/deployment.md` ### Section: Environment Variable Reference (Section 6) — update Quick Reference table Add the following rows to the existing quick reference table: | Variable | Required | Source (AWS) | Source (GCP) | |----------|----------|--------------|--------------| | `BILLING_ENABLED` | No | Task definition env var | Cloud Run env var | | `STRIPE_SECRET_KEY` | Only if billing enabled | Secrets Manager: `///stripe-secret-key` | Secret Manager: `-stripe-secret-key` | | `STRIPE_WEBHOOK_SECRET` | Only if billing enabled | Secrets Manager: `///stripe-webhook-secret` | Secret Manager: `-stripe-webhook-secret` | | `STRIPE_PRICE_ID` | Only if billing enabled | Task definition env var | Cloud Run env var | | `ANALYTICS_ENABLED` | No | Task definition env var (default: `true`) | Cloud Run env var | | `TIER_ENFORCEMENT` | No | Task definition env var (default: `true`) | Cloud Run env var | | `COMPLIANCE_ENABLED` | No | Task definition env var (default: `true`) | Cloud Run env var | | `REDIS_RATE_LIMIT_ENABLED` | No | Task definition env var | Cloud Run env var | | `RATE_LIMIT_WINDOW_MS` | No | Task definition env var (default: `60000`) | Cloud Run env var | | `RATE_LIMIT_MAX_REQUESTS` | No | Task definition env var (default: `100`) | Cloud Run env var | | `DB_POOL_MAX` | No | Task definition env var (default: `20`) | Cloud Run env var | | `DB_POOL_MIN` | No | Task definition env var (default: `2`) | Cloud Run env var | | `DB_POOL_IDLE_TIMEOUT_MS` | No | Task definition env var (default: `30000`) | Cloud Run env var | | `DB_POOL_CONNECTION_TIMEOUT_MS` | No | Task definition env var (default: `5000`) | Cloud Run env var | | `KAFKA_BROKERS` | No | Task definition env var | Cloud Run env var | | `ENFORCE_TLS` | No | Task definition env var | Cloud Run env var | | `OPA_URL` | No | Task definition env var | Cloud Run env var | | `VAULT_KV_MOUNT` | No | Task definition env var (default: `secret`) | Cloud Run env var | ### Section: Step 2.8 / Step 3.7 — Run Database Migrations — update migration count In the migration command output examples in sections 2.8 and 3.7, update migration count references from "4 migration(s)" to "26 migration(s)". --- ## File: `docs/devops/security.md` No structural changes required. Append the following note at the end of the "JWT Key Management" section: > **OIDC keys** are separate from the main JWT keys. OIDC signing keys are stored in the > `oidc_keys` PostgreSQL table (created by migration `014_create_oidc_keys_table.sql`), encrypted > at rest using pgcrypto (enabled by migration `018_enable_pgcrypto.sql`). The `OIDCKeyService` > manages rotation. OIDC keys do not need to be set as environment variables — they are > provisioned automatically on first startup. --- ## File: `docs/devops/vault-setup.md` ### Section: Add note on `VAULT_KV_MOUNT` alias After the `VAULT_MOUNT` variable description, add: > **Note:** The `.env.example` file uses `VAULT_KV_MOUNT` as the variable name. The application > reads both `VAULT_KV_MOUNT` and `VAULT_MOUNT` — prefer `VAULT_KV_MOUNT` in new configurations > for consistency with the current `.env.example`. --- ## File: `docs/devops/README.md` ### Section: Document index — add field-trial.md entry Add `field-trial.md` to the document index table: | Document | Audience | Contents | |----------|----------|---------| | ... existing entries ... | | [field-trial.md](field-trial.md) | DevOps engineers, QA | In-house Docker Compose field trial execution playbook | --- ## Acceptance Criteria - [ ] `environment-variables.md` documents all 11 new variables from Phases 3–6 - [ ] `environment-variables.md` complete `.env` example includes all Phase 6 flags - [ ] `database.md` schema overview reflects all 26 migrations (001–026) - [ ] `database.md` documents all 10 new tables added in Phases 3–6 - [ ] `database.md` connection pool section references `DB_POOL_*` env vars - [ ] `architecture.md` diagram shows Next.js portal at port 3001 - [ ] `architecture.md` service map covers all 19 route prefixes - [ ] `architecture.md` Redis table covers all 6 key patterns - [ ] `architecture.md` new services section documents all 13 Phase 3–6 services - [ ] `architecture.md` API routes section covers all 25 routes - [ ] `local-development.md` includes portal setup (Step 7) with all 9 portal routes - [ ] `operations.md` startup checklist uses `docker compose` (not `docker-compose`) - [ ] `operations.md` Redis table covers all 6 key patterns with correct TTLs - [ ] `operations.md` metrics table covers all 19 Prometheus metrics - [ ] `operations.md` troubleshooting covers tier limits, feature flag 404s, portal CORS - [ ] `deployment.md` variable quick reference includes all Phase 3–6 variables - [ ] `security.md` note on OIDC keys added - [ ] `vault-setup.md` note on `VAULT_KV_MOUNT` alias added - [ ] `README.md` index includes `field-trial.md`