8 workstreams scoped per OpenSpec standards: 1. HashiCorp Vault integration (secret management) 2. Python SDK (sentryagent-idp) 3. Go SDK (idp-sdk-go) 4. Java SDK (ai.sentryagent:idp-sdk) 5. OPA policy engine (dynamic ABAC, hot-reload Rego) 6. Web Dashboard UI (React 18 + TypeScript) 7. Prometheus + Grafana monitoring (7 metrics, pre-built dashboard) 8. Multi-region Terraform deployment (AWS + GCP) Status: proposed — awaiting CEO dependency approvals (A0.1–A0.5) before any implementation begins. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
219 lines
8.0 KiB
Markdown
219 lines
8.0 KiB
Markdown
# Phase 2: Production-Ready — Technical Design
|
|
|
|
**Date**: 2026-03-28
|
|
**Author**: Virtual Architect
|
|
**Status**: Draft — pending CEO approval of proposal
|
|
|
|
---
|
|
|
|
## 1. HashiCorp Vault Integration
|
|
|
|
### Architecture
|
|
|
|
```
|
|
AgentIdP Server
|
|
└── CredentialService
|
|
└── VaultClient (new)
|
|
└── HashiCorp Vault (sidecar or external)
|
|
└── KV Secrets Engine v2
|
|
```
|
|
|
|
### Design Decisions
|
|
|
|
**ADR-001: Vault over AWS KMS/GCP Secret Manager**
|
|
Vault is cloud-agnostic, open-source, and already standard in enterprise environments. Using Vault keeps Phase 2 cloud-provider independent.
|
|
|
|
**ADR-002: KV Secrets Engine v2**
|
|
KV v2 provides versioned secrets and metadata. When a credential is rotated, the old version is retained in Vault history, enabling audit-grade secret lifecycle tracking.
|
|
|
|
**ADR-003: AgentIdP stores Vault path, not secret**
|
|
`credentials.vault_path` stores the Vault KV path (e.g. `secret/agentidp/agents/{agentId}/credentials/{credentialId}`). The secret itself is never written to PostgreSQL.
|
|
|
|
### New environment variables
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `VAULT_ADDR` | Vault server address |
|
|
| `VAULT_TOKEN` | Vault root/service token |
|
|
| `VAULT_MOUNT` | KV mount path (default: `secret`) |
|
|
|
|
### Migration
|
|
Add `vault_path` column to `credentials` table (`005_add_vault_path.sql`). Existing credentials retain bcrypt hashes; new credentials use Vault. Both code paths coexist until all credentials are rotated (migration guide provided).
|
|
|
|
---
|
|
|
|
## 2. Multi-Language SDKs
|
|
|
|
### Shared contract (all SDKs implement identically)
|
|
|
|
```
|
|
AgentIdPClient(baseUrl, clientId, clientSecret, scopes?)
|
|
.agents → AgentRegistryClient (5 methods)
|
|
.credentials → CredentialClient (4 methods)
|
|
.tokens → TokenClient (2 methods)
|
|
.audit → AuditClient (2 methods)
|
|
.clearTokenCache()
|
|
|
|
TokenManager — auto-refresh 60s before expiry
|
|
AgentIdPError — code, message, httpStatus, details
|
|
```
|
|
|
|
### Python SDK (`sentryagent-idp`)
|
|
- Python 3.9+ (httpx for async, requests for sync)
|
|
- Both sync and async client variants
|
|
- PyPI package: `sentryagent-idp`
|
|
- Type hints throughout (`mypy --strict` clean)
|
|
|
|
### Go SDK (`github.com/sentryagent/idp-sdk-go`)
|
|
- Go 1.21+, standard library `net/http`
|
|
- Context-aware methods (`context.Context` first arg)
|
|
- Idiomatic Go error handling (`error` return, no panic)
|
|
- Go module: `github.com/sentryagent/idp-sdk-go`
|
|
|
|
### Java SDK (`ai.sentryagent:idp-sdk`)
|
|
- Java 17+, Apache HttpClient 5
|
|
- Synchronous and CompletableFuture async variants
|
|
- Maven Central: `ai.sentryagent:idp-sdk`
|
|
- Fully typed with generics
|
|
|
|
---
|
|
|
|
## 3. OPA Policy Engine
|
|
|
|
### Architecture
|
|
|
|
```
|
|
HTTP Request
|
|
→ Auth Middleware (JWT verify) — unchanged
|
|
→ OPA Middleware (new) — evaluates policy
|
|
→ OPA Wasm (embedded, no network call)
|
|
→ Rego policy files (hot-reloadable)
|
|
→ Controller
|
|
```
|
|
|
|
### Design Decisions
|
|
|
|
**ADR-004: OPA Wasm over OPA sidecar**
|
|
Embedding OPA as Wasm in the Node.js process eliminates a network hop and removes a runtime dependency. Policy files are loaded from `policies/` directory at startup and reloaded on SIGHUP.
|
|
|
|
**ADR-005: Policy replaces, does not wrap, scope check**
|
|
The existing static scope check in `auth.ts` is replaced by an OPA policy evaluation. This keeps the policy as the single source of truth for access control.
|
|
|
|
### Policy structure (`policies/`)
|
|
```
|
|
policies/
|
|
authz.rego — main policy: allow/deny
|
|
data/
|
|
scopes.json — scope → permission mapping
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Web Dashboard UI
|
|
|
|
### Architecture
|
|
|
|
```
|
|
dashboard/ (new — separate from sdk/)
|
|
src/
|
|
components/ — reusable UI components
|
|
pages/ — Agents, Credentials, Audit, Health
|
|
hooks/ — useAgents, useCredentials, useAudit
|
|
lib/
|
|
client.ts — wraps @sentryagent/idp-sdk
|
|
auth.ts — credential entry and storage
|
|
```
|
|
|
|
### Tech Stack
|
|
- React 18 + TypeScript strict
|
|
- Vite 5 (build tool)
|
|
- TanStack Query v5 (server state)
|
|
- shadcn/ui components (Radix UI + Tailwind CSS)
|
|
|
|
### Pages
|
|
| Page | Scope Required | Features |
|
|
|------|---------------|----------|
|
|
| Agents | `agents:read` | List, search, view detail, suspend/reactivate |
|
|
| Credentials | `agents:read` | List credentials per agent, rotate, revoke |
|
|
| Audit Log | `audit:read` | Filter by agent/action/outcome/date, paginate |
|
|
| Health | None | Server uptime, Redis/PostgreSQL connectivity |
|
|
|
|
### Authentication
|
|
The dashboard accepts `clientId` + `clientSecret` via a login form. The `@sentryagent/idp-sdk` `TokenManager` handles token acquisition and caching in `sessionStorage`. No backend session — all state is client-side.
|
|
|
|
---
|
|
|
|
## 5. Prometheus + Grafana Monitoring
|
|
|
|
### Metrics exposed at `GET /metrics`
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `agentidp_tokens_issued_total` | Counter | Tokens issued, labelled by outcome |
|
|
| `agentidp_agents_registered_total` | Counter | Agent registrations |
|
|
| `agentidp_http_requests_total` | Counter | All requests, labelled by method/path/status |
|
|
| `agentidp_http_request_duration_seconds` | Histogram | Request latency |
|
|
| `agentidp_rate_limit_rejections_total` | Counter | 429 responses |
|
|
| `agentidp_db_query_duration_seconds` | Histogram | PostgreSQL query latency |
|
|
| `agentidp_redis_command_duration_seconds` | Histogram | Redis command latency |
|
|
|
|
### Grafana dashboard
|
|
Pre-built JSON dashboard shipped in `monitoring/grafana/dashboards/agentidp.json`. Auto-provisioned via `monitoring/grafana/provisioning/`.
|
|
|
|
### Docker Compose extension
|
|
Add `prometheus` and `grafana` services to a `docker-compose.monitoring.yml` overlay — keeps the base `docker-compose.yml` clean for developers who don't need monitoring.
|
|
|
|
---
|
|
|
|
## 6. Multi-Region Deployment (Terraform)
|
|
|
|
### Structure
|
|
|
|
```
|
|
terraform/
|
|
modules/
|
|
agentidp/ — reusable module: compute + networking
|
|
rds/ — managed PostgreSQL
|
|
redis/ — managed Redis
|
|
lb/ — load balancer + TLS
|
|
environments/
|
|
aws/ — AWS-specific config (ECS + RDS + ElastiCache)
|
|
gcp/ — GCP-specific config (Cloud Run + Cloud SQL + Memorystore)
|
|
```
|
|
|
|
### Design Decisions
|
|
|
|
**ADR-006: Two provider targets (AWS + GCP) in Phase 2**
|
|
AWS and GCP cover the majority of developer deployments. Azure module is Phase 3. Each environment is a thin wrapper over the shared `agentidp` module.
|
|
|
|
**ADR-007: Terraform over Pulumi/CDK**
|
|
Terraform is the most widely-used IaC tool, familiar to most DevOps teams. The HCL syntax is simpler for documentation purposes.
|
|
|
|
---
|
|
|
|
## Component Interaction Map (Phase 2)
|
|
|
|
```
|
|
┌────────────────────┐
|
|
│ Web Dashboard │
|
|
│ (React + Vite) │
|
|
└────────┬───────────┘
|
|
│ HTTPS
|
|
┌────────────────▼────────────────┐
|
|
│ AgentIdP Server │
|
|
│ Auth MW → OPA MW → Controllers │
|
|
│ /metrics (prom-client) │
|
|
└──┬──────────┬──────────┬────────┘
|
|
│ │ │
|
|
┌─────▼──┐ ┌────▼───┐ ┌──▼───────┐
|
|
│Postgres│ │ Redis │ │ Vault │
|
|
└────────┘ └────────┘ └──────────┘
|
|
│
|
|
┌────────▼────────┐
|
|
│ Prometheus │
|
|
└────────┬────────┘
|
|
│
|
|
┌────────▼────────┐
|
|
│ Grafana │
|
|
└─────────────────┘
|
|
```
|