- devops docs: 8 files updated for Phase 6 state; field-trial.md added (946-line runbook) - developer docs: api-reference (50+ endpoints), quick-start, 5 existing guides updated, 5 new guides added - engineering docs: all 12 files updated (services, architecture, SDK guide, testing, overview) - OpenSpec archives: phase-7-devops-field-trial, developer-docs-phase6-update, engineering-docs-phase6-update - VALIDATOR.md + scripts/start-validator.sh: V&V Architect tooling added - .gitignore: exclude session artifacts, build artifacts, and agent workspaces Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
232 lines
13 KiB
Markdown
232 lines
13 KiB
Markdown
# Architecture
|
||
|
||
## Component Overview
|
||
|
||
```
|
||
┌───────────────────────────────────────────┐
|
||
│ Next.js Portal (port 3001) │
|
||
│ portal/ — Next.js 14 │
|
||
│ /login /agents /credentials /audit │
|
||
│ /analytics /settings/tier /compliance │
|
||
│ /webhooks /marketplace │
|
||
└────────────────┬──────────────────────────┘
|
||
│ HTTP (localhost:3000)
|
||
┌────────────────▼──────────────────────────┐
|
||
│ AgentIdP Application │
|
||
│ Node.js / Express (port 3000) │
|
||
│ │
|
||
│ TLS MW → Helmet → CORS → Morgan │
|
||
│ Metrics MW → OrgContext MW │
|
||
│ UsageMetering MW → TierEnforcement MW │
|
||
│ Auth MW → OPA MW → Routes │
|
||
│ ↓ │
|
||
│ Controllers → Services → Repos │
|
||
└──────────┬───────────────┬────────────────┘
|
||
│ │
|
||
┌────────────────▼──┐ ┌────────▼────────┐
|
||
│ PostgreSQL 14 │ │ Redis 7 │
|
||
│ Port 5432 │ │ Port 6379 │
|
||
│ │ │ │
|
||
│ 26 migrations │ │ Rate limits │
|
||
│ (001–026) │ │ Token revoke │
|
||
│ organizations │ │ Monthly counts │
|
||
│ agents + DID keys │ │ Tier counters │
|
||
│ credentials │ │ Compliance cache│
|
||
│ audit_events │ │ │
|
||
│ token_revocations │ └──────────────────┘
|
||
│ oidc_keys │
|
||
│ federation_partne-│ ┌──────────────────┐
|
||
│ rs │ │ HashiCorp Vault │
|
||
│ webhook_subscript-│ │ (optional) │
|
||
│ ions + deliveries │ │ KV v2 — creds │
|
||
│ agent_marketplace │ └──────────────────┘
|
||
│ github_oidc_trust │
|
||
│ billing │ ┌──────────────────┐
|
||
│ delegation_chains │ │ Stripe │
|
||
│ analytics_events │ │ (optional) │
|
||
│ tenant_tiers │ │ Billing/upgrades │
|
||
└────────────────────┘ └──────────────────┘
|
||
```
|
||
|
||
## Components
|
||
|
||
### AgentIdP Application
|
||
|
||
A stateless Express HTTP server. Every request is handled independently — no in-process shared state. This means it can be horizontally scaled (multiple instances) as long as all instances share the same PostgreSQL and Redis.
|
||
|
||
**Internal layers:**
|
||
|
||
| Layer | Responsibility |
|
||
|-------|---------------|
|
||
| Routes | Wire HTTP methods and paths to controllers |
|
||
| TLS middleware | Redirect HTTP → HTTPS when `ENFORCE_TLS=true` |
|
||
| Auth middleware | Validate Bearer JWT (RS256 + Redis revocation check) |
|
||
| OrgContext middleware | Resolve `organization_id` from JWT and attach to `req` |
|
||
| UsageMetering middleware | Fire-and-forget analytics event recording |
|
||
| TierEnforcement middleware | Enforce daily API call and token limits via Redis (when `TIER_ENFORCEMENT=true`) |
|
||
| OPA middleware | Scope-based authorization via embedded Wasm or JSON policy |
|
||
| Controllers | Parse and validate request, call service, return response |
|
||
| Services | Business logic — no direct DB access |
|
||
| Repositories | All SQL queries — no business logic |
|
||
| Utils | JWT sign/verify, bcrypt, error types, async handler |
|
||
|
||
### PostgreSQL 14+
|
||
|
||
Primary durable data store. All agent identities, credentials, audit events, and token revocation records live here. See [database.md](database.md) for schema details.
|
||
|
||
The application connects via a connection pool (`pg.Pool`) initialised from `DATABASE_URL`. The pool is a singleton shared across all request handlers.
|
||
|
||
### Redis 7+
|
||
|
||
Ephemeral store for three use cases:
|
||
|
||
| Key pattern | Example | Purpose | TTL |
|
||
|------------|---------|---------|-----|
|
||
| `revoked:<jti>` | `revoked:f1e2d3c4-...` | Revoked token JTI | Remaining token lifetime |
|
||
| `rate:<client_id>:<window>` | `rate:a1b2c3...:29086156` | Request count per window | `RATE_LIMIT_WINDOW_MS` |
|
||
| `monthly:<client_id>:<year>:<month>` | `monthly:a1b2c3...:2026:3` | Monthly token issuance count | End of month |
|
||
| `rate:tier:calls:<tenantId>` | `rate:tier:calls:org-uuid` | Daily API call counter for tier enforcement | Until midnight UTC |
|
||
| `rate:tier:tokens:<tenantId>` | `rate:tier:tokens:org-uuid` | Daily token issuance counter for tier enforcement | Until midnight UTC |
|
||
| `compliance:report:<tenantId>` | `compliance:report:org-uuid` | Cached compliance report JSON | 5 minutes |
|
||
|
||
**Redis is supplementary, not the source of truth.** Token revocations are also written to the `token_revocations` PostgreSQL table for durability across Redis restarts. On Redis restart, the revocation list is cold — previously revoked tokens will pass auth until the PostgreSQL-backed warm-up is implemented (Phase 2).
|
||
|
||
## Request Data Flow
|
||
|
||
```
|
||
HTTP Request
|
||
│
|
||
▼
|
||
Express Router (matches path + method)
|
||
│
|
||
▼
|
||
Auth Middleware
|
||
- Extract Bearer token from Authorization header
|
||
- Verify RS256 signature using JWT_PUBLIC_KEY
|
||
- Check Redis for revocation (key: revoked:<jti>)
|
||
- Attach decoded payload to req.user
|
||
│
|
||
▼
|
||
Rate Limit Middleware
|
||
- Key: rate:<client_id>:<60s-window>
|
||
- Increment counter in Redis (INCR + EXPIRE)
|
||
- Set X-RateLimit-* headers
|
||
- Reject with 429 if count > 100
|
||
│
|
||
▼
|
||
Controller
|
||
- Validate request body / query params (Joi schemas)
|
||
- Call service method
|
||
- Return HTTP response
|
||
│
|
||
▼
|
||
Service
|
||
- Business logic and orchestration
|
||
- Calls one or more repositories
|
||
- Fires audit log writes (async, fire-and-forget)
|
||
│
|
||
▼
|
||
Repository
|
||
- Executes parameterised SQL queries
|
||
- Maps DB rows to typed interfaces
|
||
- Returns typed results to service
|
||
│
|
||
▼
|
||
PostgreSQL / Redis
|
||
```
|
||
|
||
## Service Map
|
||
|
||
| Route prefix | Controller | Service(s) | Repository/ies |
|
||
|-------------|-----------|-----------|----------------|
|
||
| `/api/v1/agents` | `AgentController` | `AgentService` | `AgentRepository` |
|
||
| `/api/v1/credentials` | `CredentialController` | `CredentialService` | `CredentialRepository` |
|
||
| `/api/v1/token` | `TokenController` | `OAuth2Service` | `TokenRepository`, `CredentialRepository`, `AgentRepository` |
|
||
| `/api/v1/audit` | `AuditController` | `AuditService` | `AuditRepository` |
|
||
| `/api/v1/organizations` | `OrgController` | `OrgService` | `OrgRepository` |
|
||
| `/api/v1/compliance/*` | `ComplianceController` | `ComplianceService` | `AuditRepository` |
|
||
| `/api/v1/analytics/*` | `AnalyticsController` | `AnalyticsService` | direct pool queries |
|
||
| `/api/v1/tiers/*` | `TierController` | `TierService` | pool queries, Stripe SDK |
|
||
| `/api/v1/webhooks` | `WebhookController` | `WebhookService` | `WebhookRepository` |
|
||
| `/api/v1/federation` | `FederationController` | `FederationService` | direct pool queries |
|
||
| `/api/v1/marketplace` | `MarketplaceController` | `MarketplaceService` | direct pool queries |
|
||
| `/api/v1/billing` | `BillingController` | `BillingService` | direct pool queries |
|
||
| `/.well-known/did.json`, `/api/v1/did/*` | `DIDController` | `DIDService` | `AgentRepository` |
|
||
| `/.well-known/openid-configuration`, `/api/v1/oidc/*` | `OIDCController` | `OIDCKeyService`, `IDTokenService` | direct pool queries |
|
||
| `/api/v1/oidc/trust-policies` | `OIDCTrustPolicyController` | `OIDCTrustPolicyService` | direct pool queries |
|
||
| `/api/v1/delegation` | `DelegationController` | `DelegationService` | direct pool queries |
|
||
| `/api/v1/scaffold` | `ScaffoldController` | `ScaffoldService` | — |
|
||
| `/health` | inline | — | pool, redis |
|
||
| `/metrics` | inline | — | prom-client |
|
||
|
||
## New Services (Phases 3–6)
|
||
|
||
| Service | Source file | Responsibility |
|
||
|---------|------------|----------------|
|
||
| `AnalyticsService` | `src/services/AnalyticsService.ts` | Fire-and-forget `recordEvent`, time-series `getTokenTrend`, heatmap `getAgentActivity`, per-agent `getAgentUsageSummary` |
|
||
| `TierService` | `src/services/TierService.ts` | `getStatus` (reads `tenant_tiers`), `initiateUpgrade` (creates Stripe Checkout Session), `applyUpgrade` (handles Stripe webhook), `enforceAgentLimit` |
|
||
| `ComplianceService` | `src/services/ComplianceService.ts` | `generateReport` (Redis-cached 5 min), `exportAgentCards` (AGNTCY format) |
|
||
| `DelegationService` | `src/services/DelegationService.ts` | A2A delegation chain creation and verification |
|
||
| `DIDService` | `src/services/DIDService.ts` | `did:web` identifier generation and DID document management |
|
||
| `OIDCKeyService` | `src/services/OIDCKeyService.ts` | OIDC key rotation, JWKS endpoint |
|
||
| `IDTokenService` | `src/services/IDTokenService.ts` | OIDC ID token issuance |
|
||
| `FederationService` | `src/services/FederationService.ts` | Cross-tenant agent identity federation |
|
||
| `WebhookService` | `src/services/WebhookService.ts` | Event subscriptions, delivery with retry, dead-letter queue |
|
||
| `VaultService` | `src/services/VaultService.ts` | HashiCorp Vault KV v2 read/write for credential storage |
|
||
| `BillingService` | `src/services/BillingService.ts` | Stripe customer and subscription management |
|
||
| `MarketplaceService` | `src/services/MarketplaceService.ts` | Agent listing and discovery |
|
||
| `OIDCTrustPolicyService` | `src/services/OIDCTrustPolicyService.ts` | GitHub OIDC trust policy management |
|
||
| `EventPublisher` | `src/services/EventPublisher.ts` | Routes domain events to webhook delivery and Kafka (if configured) |
|
||
|
||
## Ports
|
||
|
||
| Service | Internal port | Exposed port (local dev) |
|
||
|---------|--------------|--------------------------|
|
||
| AgentIdP app | 3000 | 3000 |
|
||
| Next.js portal | 3001 | 3001 |
|
||
| PostgreSQL | 5432 | 5432 |
|
||
| Redis | 6379 | 6379 |
|
||
|
||
## API Routes (Phase 6 complete)
|
||
|
||
Base path: `/api/v1`
|
||
|
||
| Route | Method(s) | Auth | Feature flag |
|
||
|-------|----------|------|-------------|
|
||
| `/api/v1/agents` | GET, POST, PATCH, DELETE | Bearer JWT | always on |
|
||
| `/api/v1/credentials` | GET, POST, DELETE | Bearer JWT | always on |
|
||
| `/api/v1/token` | POST | none (client credentials) | always on |
|
||
| `/api/v1/audit` | GET | Bearer JWT | always on |
|
||
| `/api/v1/audit/verify` | GET | Bearer JWT | always on |
|
||
| `/api/v1/organizations` | GET, POST | Bearer JWT | always on |
|
||
| `/api/v1/compliance/controls` | GET | none | always on |
|
||
| `/api/v1/compliance/report` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
|
||
| `/api/v1/compliance/agent-cards` | GET | Bearer JWT | `COMPLIANCE_ENABLED=true` |
|
||
| `/api/v1/analytics/token-trend` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
|
||
| `/api/v1/analytics/agent-activity` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
|
||
| `/api/v1/analytics/usage-summary` | GET | Bearer JWT | `ANALYTICS_ENABLED=true` |
|
||
| `/api/v1/tiers/status` | GET | Bearer JWT | always on |
|
||
| `/api/v1/tiers/upgrade` | POST | Bearer JWT | always on |
|
||
| `/api/v1/webhooks` | GET, POST, DELETE | Bearer JWT | always on |
|
||
| `/api/v1/federation` | GET, POST | Bearer JWT | always on |
|
||
| `/api/v1/delegation` | GET, POST | Bearer JWT | always on |
|
||
| `/api/v1/marketplace` | GET | none | always on |
|
||
| `/api/v1/billing` | GET, POST | Bearer JWT | always on |
|
||
| `/api/v1/did/*` | GET | none | always on |
|
||
| `/api/v1/oidc/*` | GET, POST | mixed | always on |
|
||
| `/.well-known/openid-configuration` | GET | none | always on |
|
||
| `/.well-known/jwks.json` | GET | none | always on |
|
||
| `/.well-known/did.json` | GET | none | always on |
|
||
| `/health` | GET | none | always on |
|
||
| `/metrics` | GET | none | always on |
|
||
|
||
## Graceful Shutdown
|
||
|
||
The server listens for `SIGTERM` and `SIGINT`. On receipt:
|
||
|
||
1. `server.close()` is called — stops accepting new connections
|
||
2. In-flight requests complete
|
||
3. `process.exit(0)` is called
|
||
|
||
The PostgreSQL pool and Redis client are not explicitly closed in the current shutdown path. This is safe for single-instance deployments; connection cleanup is handled by the OS.
|