# Phase 2: Production-Ready — Technical Design **Date**: 2026-03-28 **Author**: Virtual Architect **Status**: Draft — pending CEO approval of proposal --- ## 1. HashiCorp Vault Integration ### Architecture ``` AgentIdP Server └── CredentialService └── VaultClient (new) └── HashiCorp Vault (sidecar or external) └── KV Secrets Engine v2 ``` ### Design Decisions **ADR-001: Vault over AWS KMS/GCP Secret Manager** Vault is cloud-agnostic, open-source, and already standard in enterprise environments. Using Vault keeps Phase 2 cloud-provider independent. **ADR-002: KV Secrets Engine v2** KV v2 provides versioned secrets and metadata. When a credential is rotated, the old version is retained in Vault history, enabling audit-grade secret lifecycle tracking. **ADR-003: AgentIdP stores Vault path, not secret** `credentials.vault_path` stores the Vault KV path (e.g. `secret/agentidp/agents/{agentId}/credentials/{credentialId}`). The secret itself is never written to PostgreSQL. ### New environment variables | Variable | Description | |----------|-------------| | `VAULT_ADDR` | Vault server address | | `VAULT_TOKEN` | Vault root/service token | | `VAULT_MOUNT` | KV mount path (default: `secret`) | ### Migration Add `vault_path` column to `credentials` table (`005_add_vault_path.sql`). Existing credentials retain bcrypt hashes; new credentials use Vault. Both code paths coexist until all credentials are rotated (migration guide provided). --- ## 2. Multi-Language SDKs ### Shared contract (all SDKs implement identically) ``` AgentIdPClient(baseUrl, clientId, clientSecret, scopes?) .agents → AgentRegistryClient (5 methods) .credentials → CredentialClient (4 methods) .tokens → TokenClient (2 methods) .audit → AuditClient (2 methods) .clearTokenCache() TokenManager — auto-refresh 60s before expiry AgentIdPError — code, message, httpStatus, details ``` ### Python SDK (`sentryagent-idp`) - Python 3.9+ (httpx for async, requests for sync) - Both sync and async client variants - PyPI package: `sentryagent-idp` - Type hints throughout (`mypy --strict` clean) ### Go SDK (`github.com/sentryagent/idp-sdk-go`) - Go 1.21+, standard library `net/http` - Context-aware methods (`context.Context` first arg) - Idiomatic Go error handling (`error` return, no panic) - Go module: `github.com/sentryagent/idp-sdk-go` ### Java SDK (`ai.sentryagent:idp-sdk`) - Java 17+, Apache HttpClient 5 - Synchronous and CompletableFuture async variants - Maven Central: `ai.sentryagent:idp-sdk` - Fully typed with generics --- ## 3. OPA Policy Engine ### Architecture ``` HTTP Request → Auth Middleware (JWT verify) — unchanged → OPA Middleware (new) — evaluates policy → OPA Wasm (embedded, no network call) → Rego policy files (hot-reloadable) → Controller ``` ### Design Decisions **ADR-004: OPA Wasm over OPA sidecar** Embedding OPA as Wasm in the Node.js process eliminates a network hop and removes a runtime dependency. Policy files are loaded from `policies/` directory at startup and reloaded on SIGHUP. **ADR-005: Policy replaces, does not wrap, scope check** The existing static scope check in `auth.ts` is replaced by an OPA policy evaluation. This keeps the policy as the single source of truth for access control. ### Policy structure (`policies/`) ``` policies/ authz.rego — main policy: allow/deny data/ scopes.json — scope → permission mapping ``` --- ## 4. Web Dashboard UI ### Architecture ``` dashboard/ (new — separate from sdk/) src/ components/ — reusable UI components pages/ — Agents, Credentials, Audit, Health hooks/ — useAgents, useCredentials, useAudit lib/ client.ts — wraps @sentryagent/idp-sdk auth.ts — credential entry and storage ``` ### Tech Stack - React 18 + TypeScript strict - Vite 5 (build tool) - TanStack Query v5 (server state) - shadcn/ui components (Radix UI + Tailwind CSS) ### Pages | Page | Scope Required | Features | |------|---------------|----------| | Agents | `agents:read` | List, search, view detail, suspend/reactivate | | Credentials | `agents:read` | List credentials per agent, rotate, revoke | | Audit Log | `audit:read` | Filter by agent/action/outcome/date, paginate | | Health | None | Server uptime, Redis/PostgreSQL connectivity | ### Authentication The dashboard accepts `clientId` + `clientSecret` via a login form. The `@sentryagent/idp-sdk` `TokenManager` handles token acquisition and caching in `sessionStorage`. No backend session — all state is client-side. --- ## 5. Prometheus + Grafana Monitoring ### Metrics exposed at `GET /metrics` | Metric | Type | Description | |--------|------|-------------| | `agentidp_tokens_issued_total` | Counter | Tokens issued, labelled by outcome | | `agentidp_agents_registered_total` | Counter | Agent registrations | | `agentidp_http_requests_total` | Counter | All requests, labelled by method/path/status | | `agentidp_http_request_duration_seconds` | Histogram | Request latency | | `agentidp_rate_limit_rejections_total` | Counter | 429 responses | | `agentidp_db_query_duration_seconds` | Histogram | PostgreSQL query latency | | `agentidp_redis_command_duration_seconds` | Histogram | Redis command latency | ### Grafana dashboard Pre-built JSON dashboard shipped in `monitoring/grafana/dashboards/agentidp.json`. Auto-provisioned via `monitoring/grafana/provisioning/`. ### Docker Compose extension Add `prometheus` and `grafana` services to a `docker-compose.monitoring.yml` overlay — keeps the base `docker-compose.yml` clean for developers who don't need monitoring. --- ## 6. Multi-Region Deployment (Terraform) ### Structure ``` terraform/ modules/ agentidp/ — reusable module: compute + networking rds/ — managed PostgreSQL redis/ — managed Redis lb/ — load balancer + TLS environments/ aws/ — AWS-specific config (ECS + RDS + ElastiCache) gcp/ — GCP-specific config (Cloud Run + Cloud SQL + Memorystore) ``` ### Design Decisions **ADR-006: Two provider targets (AWS + GCP) in Phase 2** AWS and GCP cover the majority of developer deployments. Azure module is Phase 3. Each environment is a thin wrapper over the shared `agentidp` module. **ADR-007: Terraform over Pulumi/CDK** Terraform is the most widely-used IaC tool, familiar to most DevOps teams. The HCL syntax is simpler for documentation purposes. --- ## Component Interaction Map (Phase 2) ``` ┌────────────────────┐ │ Web Dashboard │ │ (React + Vite) │ └────────┬───────────┘ │ HTTPS ┌────────────────▼────────────────┐ │ AgentIdP Server │ │ Auth MW → OPA MW → Controllers │ │ /metrics (prom-client) │ └──┬──────────┬──────────┬────────┘ │ │ │ ┌─────▼──┐ ┌────▼───┐ ┌──▼───────┐ │Postgres│ │ Redis │ │ Vault │ └────────┘ └────────┘ └──────────┘ │ ┌────────▼────────┐ │ Prometheus │ └────────┬────────┘ │ ┌────────▼────────┐ │ Grafana │ └─────────────────┘ ```