- engineering-docs → archive/2026-03-29-engineering-docs (63/63 tasks complete) - phase-2-production-ready → archive/2026-03-29-phase-2-production-ready (89/89 tasks complete) - openspec/specs/ synced with all Phase 1 + Phase 2 + engineering-docs capabilities (22 specs total) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.0 KiB
Phase 2: Production-Ready — Technical Design
Date: 2026-03-28 Author: Virtual Architect Status: Draft — pending CEO approval of proposal
1. HashiCorp Vault Integration
Architecture
AgentIdP Server
└── CredentialService
└── VaultClient (new)
└── HashiCorp Vault (sidecar or external)
└── KV Secrets Engine v2
Design Decisions
ADR-001: Vault over AWS KMS/GCP Secret Manager Vault is cloud-agnostic, open-source, and already standard in enterprise environments. Using Vault keeps Phase 2 cloud-provider independent.
ADR-002: KV Secrets Engine v2 KV v2 provides versioned secrets and metadata. When a credential is rotated, the old version is retained in Vault history, enabling audit-grade secret lifecycle tracking.
ADR-003: AgentIdP stores Vault path, not secret
credentials.vault_path stores the Vault KV path (e.g. secret/agentidp/agents/{agentId}/credentials/{credentialId}). The secret itself is never written to PostgreSQL.
New environment variables
| Variable | Description |
|---|---|
VAULT_ADDR |
Vault server address |
VAULT_TOKEN |
Vault root/service token |
VAULT_MOUNT |
KV mount path (default: secret) |
Migration
Add vault_path column to credentials table (005_add_vault_path.sql). Existing credentials retain bcrypt hashes; new credentials use Vault. Both code paths coexist until all credentials are rotated (migration guide provided).
2. Multi-Language SDKs
Shared contract (all SDKs implement identically)
AgentIdPClient(baseUrl, clientId, clientSecret, scopes?)
.agents → AgentRegistryClient (5 methods)
.credentials → CredentialClient (4 methods)
.tokens → TokenClient (2 methods)
.audit → AuditClient (2 methods)
.clearTokenCache()
TokenManager — auto-refresh 60s before expiry
AgentIdPError — code, message, httpStatus, details
Python SDK (sentryagent-idp)
- Python 3.9+ (httpx for async, requests for sync)
- Both sync and async client variants
- PyPI package:
sentryagent-idp - Type hints throughout (
mypy --strictclean)
Go SDK (github.com/sentryagent/idp-sdk-go)
- Go 1.21+, standard library
net/http - Context-aware methods (
context.Contextfirst arg) - Idiomatic Go error handling (
errorreturn, no panic) - Go module:
github.com/sentryagent/idp-sdk-go
Java SDK (ai.sentryagent:idp-sdk)
- Java 17+, Apache HttpClient 5
- Synchronous and CompletableFuture async variants
- Maven Central:
ai.sentryagent:idp-sdk - Fully typed with generics
3. OPA Policy Engine
Architecture
HTTP Request
→ Auth Middleware (JWT verify) — unchanged
→ OPA Middleware (new) — evaluates policy
→ OPA Wasm (embedded, no network call)
→ Rego policy files (hot-reloadable)
→ Controller
Design Decisions
ADR-004: OPA Wasm over OPA sidecar
Embedding OPA as Wasm in the Node.js process eliminates a network hop and removes a runtime dependency. Policy files are loaded from policies/ directory at startup and reloaded on SIGHUP.
ADR-005: Policy replaces, does not wrap, scope check
The existing static scope check in auth.ts is replaced by an OPA policy evaluation. This keeps the policy as the single source of truth for access control.
Policy structure (policies/)
policies/
authz.rego — main policy: allow/deny
data/
scopes.json — scope → permission mapping
4. Web Dashboard UI
Architecture
dashboard/ (new — separate from sdk/)
src/
components/ — reusable UI components
pages/ — Agents, Credentials, Audit, Health
hooks/ — useAgents, useCredentials, useAudit
lib/
client.ts — wraps @sentryagent/idp-sdk
auth.ts — credential entry and storage
Tech Stack
- React 18 + TypeScript strict
- Vite 5 (build tool)
- TanStack Query v5 (server state)
- shadcn/ui components (Radix UI + Tailwind CSS)
Pages
| Page | Scope Required | Features |
|---|---|---|
| Agents | agents:read |
List, search, view detail, suspend/reactivate |
| Credentials | agents:read |
List credentials per agent, rotate, revoke |
| Audit Log | audit:read |
Filter by agent/action/outcome/date, paginate |
| Health | None | Server uptime, Redis/PostgreSQL connectivity |
Authentication
The dashboard accepts clientId + clientSecret via a login form. The @sentryagent/idp-sdk TokenManager handles token acquisition and caching in sessionStorage. No backend session — all state is client-side.
5. Prometheus + Grafana Monitoring
Metrics exposed at GET /metrics
| Metric | Type | Description |
|---|---|---|
agentidp_tokens_issued_total |
Counter | Tokens issued, labelled by outcome |
agentidp_agents_registered_total |
Counter | Agent registrations |
agentidp_http_requests_total |
Counter | All requests, labelled by method/path/status |
agentidp_http_request_duration_seconds |
Histogram | Request latency |
agentidp_rate_limit_rejections_total |
Counter | 429 responses |
agentidp_db_query_duration_seconds |
Histogram | PostgreSQL query latency |
agentidp_redis_command_duration_seconds |
Histogram | Redis command latency |
Grafana dashboard
Pre-built JSON dashboard shipped in monitoring/grafana/dashboards/agentidp.json. Auto-provisioned via monitoring/grafana/provisioning/.
Docker Compose extension
Add prometheus and grafana services to a docker-compose.monitoring.yml overlay — keeps the base docker-compose.yml clean for developers who don't need monitoring.
6. Multi-Region Deployment (Terraform)
Structure
terraform/
modules/
agentidp/ — reusable module: compute + networking
rds/ — managed PostgreSQL
redis/ — managed Redis
lb/ — load balancer + TLS
environments/
aws/ — AWS-specific config (ECS + RDS + ElastiCache)
gcp/ — GCP-specific config (Cloud Run + Cloud SQL + Memorystore)
Design Decisions
ADR-006: Two provider targets (AWS + GCP) in Phase 2
AWS and GCP cover the majority of developer deployments. Azure module is Phase 3. Each environment is a thin wrapper over the shared agentidp module.
ADR-007: Terraform over Pulumi/CDK Terraform is the most widely-used IaC tool, familiar to most DevOps teams. The HCL syntax is simpler for documentation purposes.
Component Interaction Map (Phase 2)
┌────────────────────┐
│ Web Dashboard │
│ (React + Vite) │
└────────┬───────────┘
│ HTTPS
┌────────────────▼────────────────┐
│ AgentIdP Server │
│ Auth MW → OPA MW → Controllers │
│ /metrics (prom-client) │
└──┬──────────┬──────────┬────────┘
│ │ │
┌─────▼──┐ ┌────▼───┐ ┌──▼───────┐
│Postgres│ │ Redis │ │ Vault │
└────────┘ └────────┘ └──────────┘
│
┌────────▼────────┐
│ Prometheus │
└────────┬────────┘
│
┌────────▼────────┐
│ Grafana │
└─────────────────┘